Abstract | ||
---|---|---|
The FM index (Ferragina and Manzini in J ACM 52(4):552–581, 2005) is a widely-used compressed data structure that stores a string T in a compressed form and also supports fast pattern matching queries. In this paper, we describe new FM-index variants that combine nice theoretical properties, simple implementation and improved practical performance. Our main theoretical result is a new technique called fixed block compression boosting, which is a simpler and faster alternative to optimal compression boosting and implicit compression boosting used in previous FM-indexes. We also describe several new techniques for implementing fixed-block boosting efficiently, including a new, fast, and space-efficient implementation of wavelet trees. Our extensive experiments show the new indexes to be consistently fast and small relative to the state-of-the-art, and thus they make a good “off-the-shelf” choice for many applications. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1007/s00453-018-0475-9 | Algorithmica |
Keywords | Field | DocType |
Text indexing,Wavelet tree,FM-index,Compression boosting,Suffix array,Pattern matching | Compressed data structure,Compression (physics),Discrete mathematics,Algorithm,Suffix array,Wavelet Tree,Boosting (machine learning),FM-index,Pattern matching,Mathematics,Wavelet | Journal |
Volume | Issue | ISSN |
81.0 | 4 | 1432-0541 |
Citations | PageRank | References |
1 | 0.37 | 20 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Simon Gog | 1 | 345 | 24.99 |
Juha Kärkkäinen | 2 | 1354 | 95.20 |
Dominik Kempa | 3 | 142 | 16.37 |
Matthias Petri | 4 | 164 | 14.74 |
Simon J. Puglisi | 5 | 1132 | 75.14 |