Title
Quality-biased ranking of web documents
Abstract
Many existing retrieval approaches do not take into account the content quality of the retrieved documents, although link-based measures such as PageRank are commonly used as a form of document prior. In this paper, we present the quality-biased ranking method that promotes documents containing high-quality content, and penalizes low-quality documents. The quality of the document content can be determined by its readability, layout and ease-of-navigation, among other factors. Accordingly, instead of using a single estimate for document quality, we consider multiple content-based features that are directly integrated into a state-of- the-art retrieval method. These content-based features are easy to compute, store and retrieve, even for large web collections. We use several query sets and web collections to empirically evaluate the performance of our quality-biased retrieval method. In each case, our method consistently improves by a large margin the retrieval performance of text-based and link-based retrieval methods that do not take into account the quality of the document content.
Year
DOI
Venue
2011
10.1145/1935826.1935849
WSDM
Keywords
Field
DocType
the-art retrieval method,low-quality document,high-quality content,link-based retrieval method,retrieval performance,quality-biased retrieval method,quality-biased ranking,existing retrieval approach,document quality,web document,content quality,document content
Data mining,PageRank,Information retrieval,Ranking,Document clustering,Computer science,Readability,Ranking (information retrieval),Document quality
Conference
Citations 
PageRank 
References 
72
1.85
31
Authors
3
Name
Order
Citations
PageRank
Michael Bendersky198648.69
W. Bruce Croft2178122796.94
Yanlei Diao32234108.95