Title
A randomized Numerical Aligner (rNA)
Abstract
With the advent of new sequencing technologies able to produce an enormous quantity of short genomic sequences, new tools able to search for them inside a genomic reference sequence have emerged. Because of chemical reading errors or of the variability between organisms, one is interested in finding not only exact occurrences, but also occurrences with up to k mismatches. The contribution of this paper is twofold. On the one hand, we present a generalization of the classical Rabin-Karp string matching algorithm to solve the k-mismatch problem, with average complexity O(n+m) (n text and m pattern lengths, respectively). On the other hand, we show how to employ this idea in conjunction with an index over the text, allowing to search a pattern, with up to k mismatches, in time proportional to its length. This novel tool-rNA (randomized Numerical Aligner)-is in general faster and more accurate than other available tools like SOAP2, BWA, and BOWTIE. rNA executables and source code are freely available at http://iga-rna.sourceforge.net/.
Year
DOI
Venue
2012
10.1016/j.jcss.2011.12.007
J. Comput. Syst. Sci.
Keywords
Field
DocType
available tool,average complexity,new tool,randomized numerical aligner,genomic reference sequence,chemical reading error,short genomic sequence,new sequencing technology,n text,k mismatches,m pattern length,hamming distance,next generation sequencing,string matching,genome sequence,approximate string matching,sequencing,indexation
String searching algorithm,Commentz-Walter algorithm,Source code,Algorithm,Theoretical computer science,Hamming distance,Approximate string matching,Residue number system,Reference genome,Mathematics,Executable
Journal
Volume
Issue
ISSN
78
6
0022-0000
ISBN
Citations 
PageRank 
3-642-13088-7
5
0.61
References 
Authors
27
3
Name
Order
Citations
PageRank
Alberto Policriti178574.35
Alexandru I. Tomescu210322.57
Francesco Vezzi3796.03