Title
Efficient online index maintenance for contiguous inverted lists
Abstract
Search engines and other text retrieval systems use high-performance inverted indexes to provide efficient text query evaluation. Algorithms for fast query evaluation and index construction are well-known, but relatively little has been published concerning update. In this paper, we experimentally evaluate the two main alternative strategies for index maintenance in the presence of insertions, with the constraint that inverted lists remain contiguous on disk for fast query evaluation. The in-place and re-merge strategies are benchmarked against the baseline of a complete re-build. Our experiments with large volumes of web data show that re-merge is the fastest approach if large buffers are available, but that even a simple implementation of in-place update is suitable when the rate of insertion is low or memory buffer size is limited. We also show that with careful design of aspects of implementation such as free-space management, in-place update can be improved by around an order of magnitude over a naïve implementation.
Year
DOI
Venue
2006
10.1016/j.ipm.2005.09.005
Inf. Process. Manage.
Keywords
Field
DocType
search engines,efficient text query evaluation,index update,index construction,large volume,fast query evaluation,high-performance inverted index,large buffer,in-place update,index update.,contiguous inverted list,inverted list,index maintenance,simple implementation,text indexing,efficient online index maintenance,search engine,inverted index,indexation
Inverted index,Information system,Data mining,Indexation,Search engine,Information retrieval,Computer science,Contiguity (probability theory),Search engine indexing,Memory buffer register,Text retrieval
Journal
Volume
Issue
ISSN
42
4
Information Processing and Management
Citations 
PageRank 
References 
29
1.14
32
Authors
3
Name
Order
Citations
PageRank
Nicholas Lester11648.41
Justin Zobel26882880.46
Hugh Williams3754.58