Title
Effective top-k computation with term-proximity support
Abstract
Modern web search engines are expected to return the top-k results efficiently. Although many dynamic index pruning strategies have been proposed for efficient top-k computation, most of them are prone to ignoring some especially important factors in ranking functions, such as term-proximity (the distance relationship between query terms in a document). In our recent work [Zhu, M., Shi, S., Li, M., & Wen, J. (2007). Effective top-k computation in retrieving structured documents with term-proximity support. In Proceedings of 16th CIKM conference (pp. 771-780)], we demonstrated that, when term-proximity is incorporated into ranking functions, most existing index structures and top-k strategies become quite inefficient. To solve this problem, we built the inverted index based on web page structure and proposed the query processing strategies accordingly. The experimental results indicate that the proposed index structures and query processing strategies significantly improve the top-k efficiency. In this paper, we study the possibility of adopting additional techniques to further improve top-k computation efficiency. We propose a Proximity-Probe Heuristic to make our top-k algorithms more efficient. We also test the efficiency of our approaches on various settings (linear or non-linear ranking functions, exact or approximate top-k processing, etc.).
Year
DOI
Venue
2009
10.1016/j.ipm.2009.04.002
Inf. Process. Manage.
Keywords
Field
DocType
query processing strategy,effective top-k computation,non-linear ranking function,document structure,top- k,approximate top- k,top-k strategy,term-proximity,ranking function,top-k efficiency,top-k algorithm,top-k computation efficiency,top-k result,dynamic index pruning,approximate top-k processing,efficient top-k computation,term-proximity support,proximity-probe,web pages,inverted index,web search engine,indexation
Inverted index,Data mining,Heuristic,Search engine,Web page,Information retrieval,Ranking,Computer science,Document Structure Description,Computation
Journal
Volume
Issue
ISSN
45
4
Information Processing and Management
Citations 
PageRank 
References 
4
0.42
24
Authors
4
Name
Order
Citations
PageRank
Mingjie Zhu1894.32
Shuming Shi262058.27
Mingjing Li33076192.39
Ji-Rong Wen44431265.98