Efficient Alignment Free Sequence Comparison With Bounded Mismatches - Citegraph

Paper Info

Title
Efficient Alignment Free Sequence Comparison With Bounded Mismatches

Abstract
Alignment free sequence comparison methods are attracting persistent interest, driven by data-intensive applications in genome-wide molecular taxonomy and phylogentic reconstruction. Among the methods based on substring composition, the Average Common Substring (ACS) measure proposed by Burstein et al. (RECOMB 2005) admits a straightforward linear time sequence comparison algorithm, while yielding impressive results in multiple applications. An important direction of research is to extend the approach to permit a bounded edit/hamming distance between substrings, so as to reflect more accurately the evolutionary process. To date, however, algorithms designed to incorporate k >= 1 mismatches have O(kn(2)) worst-case complexity, worse than the O(n(2)) alignment algorithms they are meant to replace. On the other hand, accounting for mismatches does show to lead to much improved classification, while heuristics can improve practical performance. In this paper, we close the gap by presenting the first provably efficient algorithm for the k-mismatch average common string (ACS(k)) problem that takes O(n) space and O(n log(k+1) n) time in the worst case for any constant k. Our method extends the generalized suffix tree model to incorporate a carefully selected bounded set of perturbed suffixes, and can be applicable to other complex approximate sequence matching problems.

Year	DOI	Venue
2015	10.1007/978-3-319-16706-0_1	RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY (RECOMB 2015)
Field	DocType	Volume
Combinatorics,Sequence matching,Substring,Biology,Bounded set,Heuristics,Hamming distance,Generalized suffix tree,Genetics,Time complexity,Bounded function	Conference	9029
ISSN	Citations	PageRank
0302-9743	9	0.61
References	Authors
19	3

Authors (3 rows)

Cited by (9 rows)

References (19 rows)

Name	Order	Citations	PageRank
Aluru, Srinivas	1	1166	122.83
Alberto Apostolico	2	15	1.37
Sharma V. Thankachan	3	289	41.02

1