Efficient Computation of Sequence Mappability - Citegraph

Paper Info

Title
Efficient Computation of Sequence Mappability

Abstract
Sequence mappability is an important task in genome resequencing. In the (k, m)-mappability problem, for a given sequence T of length n, the goal is to compute a table whose ith entry is the number of indices $$j \ne i$$ such that the length-m substrings of T starting at positions i and j have at most k mismatches. Previous works on this problem focused on heuristics computing a rough approximation of the result or on the case of $$k=1$$ . We present several efficient algorithms for the general case of the problem. Our main result is an algorithm that, for $$k=O(1)$$ , works in $$O(n)$$ space and, with high probability, in $$O(n \cdot \min \{m^k,\log ^k n\})$$ time. Our algorithm requires a careful adaptation of the k-errata trees of Cole et al. [STOC 2004] to avoid multiple counting of pairs of substrings. Our technique can also be applied to solve the all-pairs Hamming distance problem introduced by Crochemore et al. [WABI 2017]. We further develop $$O(n^2)$$ -time algorithms to compute all (k, m)-mappability tables for a fixed m and all $$k\in \{0,\ldots ,m\}$$ or a fixed k and all $$m\in \{k,\ldots ,n\}$$ . Finally, we show that, for $$k,m = \Theta (\log n)$$ , the (k, m)-mappability problem cannot be solved in strongly subquadratic time unless the Strong Exponential Time Hypothesis fails. This is an improved and extended version of a paper presented at SPIRE 2018.

Year	DOI	Venue
2022	10.1007/s00453-022-00934-y	Algorithmica
Keywords	DocType	Volume
Sequence mappability, k-errata tree, Hamming distance	Journal	84
Issue	ISSN	Citations
5	0178-4617	0
PageRank	References	Authors
0.34	20	6

Authors (6 rows)

Cited by (0 rows)

References (20 rows)

Name	Order	Citations	PageRank
Panagiotis Charalampopoulos	1	3	4.12
Costas S. Iliopoulos	2	1534	167.43
Tomasz Kociumaka	3	0	0.34
Solon P. Pissis	4	0	0.34
Jakub Radoszewski	5	624	50.36
Juliusz Straszynski	6	2	5.15

1