Title
Faster Computation of Genome Mappability with one Mismatch
Abstract
Summary form only given. The genome mappability problem refers to cataloging repetitive occurrences of every substring of length m in a genome, and its k-mappability variant extends this to approximate repeats by allowing up to k mismatches. This problem is formulated as follows: Given a sequence S[1, n] of length n over the constant DNA alphabet Σ = {A, C, G, T}, and two integers k and m ≤ n, output an integer array F <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</sub> , such that: F <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</sub> [i] = |{j ≠ i|d <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">H</sub> (S[i, i + m - 1], S[j, j + m - 1]) ≤ k}| where d <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">H</sub> (·,·) represents the hamming distance. Derrien et al. [PLoS one 2012] represented this problem within the framework of genome analysis. In this work we present a provably efficient algorithm for 1-mappability with O(n log n) worst case run time and O(n) spece. The fundamental technique is the heavy path decomposition on the suffix tree (ST) of S, and the entire work is based on the framework by Thankachan et al. [RECOMB 2018]. The previous best known run time is O(n log n log log n) [Alzamel et al., COCOA 2017].
Year
DOI
Venue
2018
10.1109/ICCABS.2018.8541897
2018 IEEE 8th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS)
Keywords
Field
DocType
Genome mappability,heavy path decomposition,Hamming distance
Integer,Log-log plot,Combinatorics,Substring,Computer science,Heavy path decomposition,Hamming distance,Suffix tree,Time complexity,Genetics,Computation
Conference
ISSN
ISBN
Citations 
2164-229X
978-1-5386-8521-1
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Sahar Hooshmand113.07
Paniz Abedin211.72
Daniel Gibney304.06
Aluru, Srinivas41166122.83
Sharma V. Thankachan528941.02