Title
On characterizing and computing the diversity of hyperlinks for anti-spamming page ranking
Abstract
With the advent of big data era, efficiently and effectively querying useful information on the Web, the largest heterogeneous data source in the world, is becoming increasingly challenging. Page ranking is an essential component of search engines because it determines the presentation sequence of the tens of millions of returned pages associated with a single query. It therefore plays a significant role in regulating the search quality and user experience for information retrieval. When measuring the authority of a web page, most methods focus on the quantity and the quality of the neighborhood pages that direct to it using inbound hyperlinks. However, these methods ignore the diversity of such neighborhood pages, which we believe is an important metric for objectively evaluating web page authority. In comparison with true authority pages that usually contain a large number of inbound hyperlinks from a wide variety of sources, it is difficult for fake authorities, which boost their page rank using techniques such as link farms, to occupy the high diversity of inbound hyperlinks due to prohibitively high costs. We propose a probabilistic counting-based method to quantitatively and efficiently compute the diversity of inbound hyperlinks. We then propose a novel link-based ranking algorithm, named Drank, to rank pages by simultaneously analyzing the quantity, quality and diversity of their inbound hyperlinks. The validations on both synthetic and real-world data show that Drank outperforms other state-of-the-art methods in terms of both finding high-quality pages and suppressing web spams.
Year
DOI
Venue
2015
10.1016/j.knosys.2014.12.028
Knowledge-Based Systems
Keywords
Field
DocType
search engine
Data mining,Printer-friendly,User experience design,World Wide Web,Information retrieval,Ranking,Web page,Computer science,Hyperlink,Link farm,Big data,Spamming
Journal
Volume
Issue
ISSN
77
C
0950-7051
Citations 
PageRank 
References 
2
0.36
21
Authors
5
Name
Order
Citations
PageRank
Bo Yang182264.08
Hechang Chen2189.53
Xuehua Zhao342.80
Masato Naka420.36
Jing Huang5102.21