Abstract | ||
---|---|---|
Summary form only given. The genome mappability problem refers to cataloging repetitive occurrences of every substring of length m in a genome, and its k-mappability variant extends this to approximate repeats by allowing up to k mismatches. This problem is formulated as follows: Given a sequence S[1, n] of length n over the constant DNA alphabet Σ = {A, C, G, T}, and two integers k and m ≤ n, output an integer array F
<sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</sub>
, such that: F
<sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</sub>
[i] = |{j ≠ i|d
<sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">H</sub>
(S[i, i + m - 1], S[j, j + m - 1]) ≤ k}| where d
<sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">H</sub>
(·,·) represents the hamming distance. Derrien et al. [PLoS one 2012] represented this problem within the framework of genome analysis. In this work we present a provably efficient algorithm for 1-mappability with O(n log n) worst case run time and O(n) spece. The fundamental technique is the heavy path decomposition on the suffix tree (ST) of S, and the entire work is based on the framework by Thankachan et al. [RECOMB 2018]. The previous best known run time is O(n log n log log n) [Alzamel et al., COCOA 2017]. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/ICCABS.2018.8541897 | 2018 IEEE 8th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS) |
Keywords | Field | DocType |
Genome mappability,heavy path decomposition,Hamming distance | Integer,Log-log plot,Combinatorics,Substring,Computer science,Heavy path decomposition,Hamming distance,Suffix tree,Time complexity,Genetics,Computation | Conference |
ISSN | ISBN | Citations |
2164-229X | 978-1-5386-8521-1 | 0 |
PageRank | References | Authors |
0.34 | 0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Sahar Hooshmand | 1 | 1 | 3.07 |
Paniz Abedin | 2 | 1 | 1.72 |
Daniel Gibney | 3 | 0 | 4.06 |
Aluru, Srinivas | 4 | 1166 | 122.83 |
Sharma V. Thankachan | 5 | 289 | 41.02 |