Mantis: A Fast, Small, and Exact Large-Scale Sequence-Search Index. - Citegraph

Paper Info

Title
Mantis: A Fast, Small, and Exact Large-Scale Sequence-Search Index.

Abstract
Sequence-level searches on large collections of RNA sequencing experiments, such as the NCBI Sequence Read Archive (SRA), would enable one to ask many questions about the expression or variation of a given transcript in a population. Existing approaches, such as the sequence Bloom tree, suffer from fundamental limitations of the Bloom filter, resulting in slow build and query times, less-than-optimal space usage, and potentially large numbers of false-positives. This paper introduces Mantis, a space-efficient system that uses new data structures to index thousands of raw-read experiments and facilitates large-scale sequence searches. In our evaluation, index construction with Mantis is 6× faster and yields a 20% smaller index than the state-of-the-art split sequence Bloom tree (SSBT). For queries, Mantis is 6–108× faster than SSBT and has no false-positives or -negatives. For example, Mantis was able to search for all 200,400 known human transcripts in an index of 2,652 RNA sequencing experiments in 82 min; SSBT took close to 4 days.

Year	DOI	Venue
2018	10.1016/j.cels.2018.05.021	Cell Systems
Keywords	Field	DocType
sequence search,RNA sequencing,de Bruijn graph,color equivalence classes,Mantis,experiment discovery,counting quotient filter,sequence Bloom tree,Bloom filter	Data structure,Population,Bloom filter,Graph traversal,Computer science,Algorithm,Search engine indexing,De Bruijn graph,False positive paradox,Mantis	Conference
Volume	Issue	ISSN
7	2	2405-4712
Citations	PageRank	References
2	0.40	0
Authors
6

Authors (6 rows)

Cited by (2 rows)

References (0 rows)

Name	Order	Citations	PageRank
Prashant Pandey	1	18	3.01
Fatemeh Almodaresi	2	7	2.56
Michael A. Bender	3	2144	138.24
Alex Ramirez	4	1411	58.19
Rob Johnson	5	562	39.43
Rob Patro	6	111	12.98

1