Abstract | ||
---|---|---|
We address the problem of counting the number of strings in a collection where a given pattern appears, which has applications in information retrieval and data mining. Existing solutions are in a theoretical stage. In this pa-per we implement these solutions and explore compressed variants, aiming to reduce data structure size. Our main result is to uncover some unexpected compressibility properties of the fastest known data structure for the problem. By taking advantage of these properties, we can reduce the size of the structure by a factor of 5-400, depending on the dataset. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1109/DCC.2015.55 | 2015 Data Compression Conference |
Keywords | Field | DocType |
document counting,compressed space,string counting,information retrieval,data mining,data structure size reduction,compressibility properties | Compressibility,String searching algorithm,Data mining,Data structure,Information retrieval,Computer science,Theoretical computer science,Document handling,Data compression,Encoding (memory) | Conference |
ISSN | Citations | PageRank |
1068-0314 | 3 | 0.39 |
References | Authors | |
14 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Travis Gagie | 1 | 643 | 63.61 |
Aleksi Hartikainen | 2 | 37 | 1.98 |
Juha Kärkkäinen | 3 | 1354 | 95.20 |
Gonzalo Navarro | 4 | 109 | 11.07 |
Simon J. Puglisi | 5 | 1132 | 75.14 |
Jouni Sirén | 6 | 222 | 14.85 |