Title
Compression of inverted indexes For fast query evaluation
Abstract
Compression reduces both the size of indexes and the time needed to evaluate queries. In this paper, we revisit the compression of inverted lists of document postings that store the position and frequency of indexed terms, considering two approaches to improving retrieval efficiency: better implementation and better choice of integer compression schemes. First, we propose several simple optimisations to well-known integer compression schemes, and show experimentally that these lead to significant reductions in time. Second, we explore the impact of choice of compression scheme on retrieval efficiency.In experiments on large collections of data, we show two surprising results: use of simple byte-aligned codes halves the query evaluation time compared to the most compact Golomb-Rice bitwise compression schemes; and, even when an index fits entirely in memory, byte-aligned codes result in faster query evaluation than does an uncompressed index, emphasising that the cost of transferring data from memory to the CPU cache is less for an appropriately compressed index than for an uncompressed index. Moreover, byte-aligned schemes have only a modest space overhead: the most compact schemes result in indexes that are around 10% of the size of the collection, while a byte-aligned scheme is around 13%. We conclude that fast byte-aligned codes should be used to store integers in inverted lists.
Year
DOI
Venue
2002
10.1145/564376.564416
SIGIR
Keywords
Field
DocType
uncompressed index,byte-aligned codes result,inverted list,compact golomb-rice bitwise compression,retrieval efficiency,fast query evaluation,byte-aligned scheme,compact schemes result,integer compression scheme,well-known integer compression scheme,inverted index,compression scheme,indexing terms,indexation
Integer,Inverted index,Index compression,Compression (physics),Data mining,Central processing unit,Information retrieval,Bitwise operation,Cache,Computer science,Uncompressed video
Conference
ISBN
Citations 
PageRank 
1-58113-561-0
125
6.26
References 
Authors
12
4
Search Limit
100125
Name
Order
Citations
PageRank
Falk Scholer1124493.27
Hugh E. Williams2104893.45
John Yiannis31448.07
Justin Zobel46882880.46