Title
Epr-Dictionaries: A Practical And Fast Data Structure For Constant Time Searches In Unidirectional And Bidirectional Fm Indices
Abstract
The unidirectional FM index was introduced by Ferragina and Manzini in 2000 and allows to search a pattern in the index in one direction. The bidirectional FM index (2FM) was introduced by Lam et al. in 2009. It allows to search for a pattern by extending an infix of the pattern arbitrarily to the left or right. If a is the size of the alphabet then the method of Lam et al. can conduct one step in time 0(a) while needing space 0(sigma.n) using constant time rank queries on bit vectors. Schnattinger and colleagues improved this time to 0(log a) while using 0(log sigma.n) bits of space for both, the FM and 2FM index. This is achieved by the use of binary wavelet trees.In this paper we introduce a new, practical method for conducting an exact search in a uni- and bidirectional FM index in 0(1) time per step while using 0(log sigma. n) + o(log sigma. sigma.n) bits of space. This is done by replacing the binary wavelet tree by a new data structure, the Enhanced Prefixsurn Rank dictionary (EPR-dictionary).We implemented this method in the SeqAn C++ library and experimentally validated our theoretical results. In addition we compared our implementation with other freely available implementations of bidirectional indices and show that we are between approximate to 2.2-4.2 times faster. This will have a large impact for many bioinformatics applications that rely on practical implementations of (2)FM indices e.g. for read mapping. To our knowledge this is the first implementation of a constant time method for a search step in 2FM indices.
Year
DOI
Venue
2017
10.1007/978-3-319-56970-3_12
RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2017
Keywords
Field
DocType
FM index, Bidirectional, BWT, Bit vector, Rank queries, Read mapping
Data structure,Combinatorics,Computer science,Electron paramagnetic resonance,Algorithm,FM-index,Sigma,Genetics,Bit array,Alphabet,Binary number
Conference
Volume
ISSN
Citations 
10229
0302-9743
0
PageRank 
References 
Authors
0.34
15
3
Name
Order
Citations
PageRank
Christopher Pockrandt100.34
Marcel Ehrhardt200.68
Knut Reinert31020105.87