Title
Document Listing On Repetitive Collections
Abstract
Many document collections consist largely of repeated material, and several indexes have been designed to take advantage of this. There has been only preliminary work, however, on document retrieval for repetitive collections. In this paper we show how one of those indexes, the run-length compressed suffix array (RLCSA), can be extended to support document listing. In our experiments, our additional structures on top of the RLCSA can reduce the query time for document listing by an order of magnitude while still using total space that is only a fraction of the raw collection size. As a byproduct, we develop a new document listing technique for general collections that is of independent interest.
Year
DOI
Venue
2013
10.1007/978-3-642-38905-4_12
COMBINATORIAL PATTERN MATCHING
Field
DocType
Volume
Discrete mathematics,World Wide Web,Information retrieval,Computer science,Range query (data structures),Wavelet Tree,Document retrieval,Compressed suffix array
Conference
7922
ISSN
Citations 
PageRank 
0302-9743
12
0.52
References 
Authors
18
5
Name
Order
Citations
PageRank
Travis Gagie164363.61
Kalle Karhu2243.56
Gonzalo Navarro364735.74
Simon J. Puglisi4113275.14
Jouni Sirén522214.85