Quality-biased ranking of web documents - Citegraph

Paper Info

Title
Quality-biased ranking of web documents

Abstract
Many existing retrieval approaches do not take into account the content quality of the retrieved documents, although link-based measures such as PageRank are commonly used as a form of document prior. In this paper, we present the quality-biased ranking method that promotes documents containing high-quality content, and penalizes low-quality documents. The quality of the document content can be determined by its readability, layout and ease-of-navigation, among other factors. Accordingly, instead of using a single estimate for document quality, we consider multiple content-based features that are directly integrated into a state-of- the-art retrieval method. These content-based features are easy to compute, store and retrieve, even for large web collections. We use several query sets and web collections to empirically evaluate the performance of our quality-biased retrieval method. In each case, our method consistently improves by a large margin the retrieval performance of text-based and link-based retrieval methods that do not take into account the quality of the document content.

Year	DOI	Venue
2011	10.1145/1935826.1935849	WSDM
Keywords	Field	DocType
the-art retrieval method,low-quality document,high-quality content,link-based retrieval method,retrieval performance,quality-biased retrieval method,quality-biased ranking,existing retrieval approach,document quality,web document,content quality,document content	Data mining,PageRank,Information retrieval,Ranking,Document clustering,Computer science,Readability,Ranking (information retrieval),Document quality	Conference
Citations	PageRank	References
72	1.85	31
Authors
3

Authors (3 rows)

Cited by (72 rows)

References (31 rows)

Name	Order	Citations	PageRank
Michael Bendersky	1	986	48.69
W. Bruce Croft	2	17812	2796.94
Yanlei Diao	3	2234	108.95

1