Title
Using structural information to improve search in Web collections
Abstract
In this work, we investigate the problem of using the block structure of Web pages to improve ranking results. Starting with basic intuitions provided by the concepts of term frequency (TF) and inverse document frequency (IDF), we propose nine block-weight functions to distinguish the impact of term occurrences inside page blocks, instead of inside whole pages. These are then used to compute a modified BM25 ranking function. Using four distinct Web collections, we ran extensive experiments to compare our block-weight ranking formulas with two other baselines: (a) a BM25 ranking applied to full pages, and (b) a BM25 ranking that takes into account best blocks. Our methods suggest that our block-weighting ranking method is superior to all baselines across all collections we used and that average gain in precision figures from 5 to 20% are generated. © 2010 Wiley Periodicals, Inc.
Year
DOI
Venue
2010
10.1002/asi.v61:12
JASIST
Keywords
Field
DocType
frames,web pages,weighting,information content
Data mining,Block structure,Weighting,Ranking,Information retrieval,Word lists by frequency,Web page,tf–idf,Computer science,Intuition,Ranking (information retrieval)
Journal
Volume
Issue
ISSN
61
12
1532-2882
Citations 
PageRank 
References 
21
0.60
12
Authors
5
Name
Order
Citations
PageRank
Edleno Silva de Moura198875.44
David Fernandes21128.04
Berthier Ribeiro-Neto3120075.82
Altigran Soares da Silva471865.15
Marcos André Gonçalves52740191.03