Title
Mantis: A Fast, Small, and Exact Large-Scale Sequence-Search Index.
Abstract
Sequence-level searches on large collections of RNA sequencing experiments, such as the NCBI Sequence Read Archive (SRA), would enable one to ask many questions about the expression or variation of a given transcript in a population. Existing approaches, such as the sequence Bloom tree, suffer from fundamental limitations of the Bloom filter, resulting in slow build and query times, less-than-optimal space usage, and potentially large numbers of false-positives. This paper introduces Mantis, a space-efficient system that uses new data structures to index thousands of raw-read experiments and facilitates large-scale sequence searches. In our evaluation, index construction with Mantis is 6× faster and yields a 20% smaller index than the state-of-the-art split sequence Bloom tree (SSBT). For queries, Mantis is 6–108× faster than SSBT and has no false-positives or -negatives. For example, Mantis was able to search for all 200,400 known human transcripts in an index of 2,652 RNA sequencing experiments in 82 min; SSBT took close to 4 days.
Year
DOI
Venue
2018
10.1016/j.cels.2018.05.021
Cell Systems
Keywords
Field
DocType
sequence search,RNA sequencing,de Bruijn graph,color equivalence classes,Mantis,experiment discovery,counting quotient filter,sequence Bloom tree,Bloom filter
Data structure,Population,Bloom filter,Graph traversal,Computer science,Algorithm,Search engine indexing,De Bruijn graph,False positive paradox,Mantis
Conference
Volume
Issue
ISSN
7
2
2405-4712
Citations 
PageRank 
References 
2
0.40
0
Authors
6
Name
Order
Citations
PageRank
Prashant Pandey1183.01
Fatemeh Almodaresi272.56
Michael A. Bender32144138.24
Alex Ramirez4141158.19
Rob Johnson556239.43
Rob Patro611112.98