Abstract | ||
---|---|---|
Sequence-level searches on large collections of RNA sequencing experiments, such as the NCBI Sequence Read Archive (SRA), would enable one to ask many questions about the expression or variation of a given transcript in a population. Existing approaches, such as the sequence Bloom tree, suffer from fundamental limitations of the Bloom filter, resulting in slow build and query times, less-than-optimal space usage, and potentially large numbers of false-positives. This paper introduces Mantis, a space-efficient system that uses new data structures to index thousands of raw-read experiments and facilitates large-scale sequence searches. In our evaluation, index construction with Mantis is 6× faster and yields a 20% smaller index than the state-of-the-art split sequence Bloom tree (SSBT). For queries, Mantis is 6–108× faster than SSBT and has no false-positives or -negatives. For example, Mantis was able to search for all 200,400 known human transcripts in an index of 2,652 RNA sequencing experiments in 82 min; SSBT took close to 4 days. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1016/j.cels.2018.05.021 | Cell Systems |
Keywords | Field | DocType |
sequence search,RNA sequencing,de Bruijn graph,color equivalence classes,Mantis,experiment discovery,counting quotient filter,sequence Bloom tree,Bloom filter | Data structure,Population,Bloom filter,Graph traversal,Computer science,Algorithm,Search engine indexing,De Bruijn graph,False positive paradox,Mantis | Conference |
Volume | Issue | ISSN |
7 | 2 | 2405-4712 |
Citations | PageRank | References |
2 | 0.40 | 0 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Prashant Pandey | 1 | 18 | 3.01 |
Fatemeh Almodaresi | 2 | 7 | 2.56 |
Michael A. Bender | 3 | 2144 | 138.24 |
Alex Ramirez | 4 | 1411 | 58.19 |
Rob Johnson | 5 | 562 | 39.43 |
Rob Patro | 6 | 111 | 12.98 |