Title
Fixed Block Compression Boosting in FM-Indexes: Theory and Practice
Abstract
The FM index (Ferragina and Manzini in J ACM 52(4):552–581, 2005) is a widely-used compressed data structure that stores a string T in a compressed form and also supports fast pattern matching queries. In this paper, we describe new FM-index variants that combine nice theoretical properties, simple implementation and improved practical performance. Our main theoretical result is a new technique called fixed block compression boosting, which is a simpler and faster alternative to optimal compression boosting and implicit compression boosting used in previous FM-indexes. We also describe several new techniques for implementing fixed-block boosting efficiently, including a new, fast, and space-efficient implementation of wavelet trees. Our extensive experiments show the new indexes to be consistently fast and small relative to the state-of-the-art, and thus they make a good “off-the-shelf” choice for many applications.
Year
DOI
Venue
2019
10.1007/s00453-018-0475-9
Algorithmica
Keywords
Field
DocType
Text indexing,Wavelet tree,FM-index,Compression boosting,Suffix array,Pattern matching
Compressed data structure,Compression (physics),Discrete mathematics,Algorithm,Suffix array,Wavelet Tree,Boosting (machine learning),FM-index,Pattern matching,Mathematics,Wavelet
Journal
Volume
Issue
ISSN
81.0
4
1432-0541
Citations 
PageRank 
References 
1
0.37
20
Authors
5
Name
Order
Citations
PageRank
Simon Gog134524.99
Juha Kärkkäinen2135495.20
Dominik Kempa314216.37
Matthias Petri416414.74
Simon J. Puglisi5113275.14