Title
Document Counting in Compressed Space
Abstract
We address the problem of counting the number of strings in a collection where a given pattern appears, which has applications in information retrieval and data mining. Existing solutions are in a theoretical stage. In this pa-per we implement these solutions and explore compressed variants, aiming to reduce data structure size. Our main result is to uncover some unexpected compressibility properties of the fastest known data structure for the problem. By taking advantage of these properties, we can reduce the size of the structure by a factor of 5-400, depending on the dataset.
Year
DOI
Venue
2015
10.1109/DCC.2015.55
2015 Data Compression Conference
Keywords
Field
DocType
document counting,compressed space,string counting,information retrieval,data mining,data structure size reduction,compressibility properties
Compressibility,String searching algorithm,Data mining,Data structure,Information retrieval,Computer science,Theoretical computer science,Document handling,Data compression,Encoding (memory)
Conference
ISSN
Citations 
PageRank 
1068-0314
3
0.39
References 
Authors
14
6
Name
Order
Citations
PageRank
Travis Gagie164363.61
Aleksi Hartikainen2371.98
Juha Kärkkäinen3135495.20
Gonzalo Navarro410911.07
Simon J. Puglisi5113275.14
Jouni Sirén622214.85