Title
Longest Common Prefix with Mismatches.
Abstract
The Longest Common Prefix LCP array is a data structure commonly used in combination with the Suffix Array. However, in some settings we are interested in the LCP values per se since they provide useful information on the repetitiveness of the underlying sequence. Since sequences can contain alterations, which can be either malicious plagiarism attempts or pseudo-random as in sequencing experiments, when using LCP values to measure repetitiveness it makes sense to allow for a small number of errors. In this paper we formalize this notion by considering the longest common prefix in the presence of mismatches. In particular, we propose an algorithm that computes, for each text suffix, the length of its longest prefix that occurs elsewhere in the text with at most one mismatch. For a sequence of length n our algorithm uses $$\\Theta n\\log n$$ bits and runs in $$\\mathcal {O}n \\text{L}_{ave}\\log n/\\log \\log n$$ time where $$\\text{L}_{ave}$$ is the average LCP of the input sequence. Although $$\\text{L}_{ave}$$ is $$\\Theta n$$ in the worst case, recent analyses of real world data show that it usually grows logarithmically with the input size. We then describe and analyse a second algorithm that uses a greedy strategy to reduce the amount of computation and that can be turned into an even faster algorithm if allow an additive one-sided error. Finally, we consider the related problem of computing the 1-mappability of a sequence. In this problem we are asked to compute, for each length-m substring of the input sequence, the number of other substrings which are at Hamming distance one. For this problem we propose an algorithm that takes $$\\mathcal {O}m n \\log n/\\log \\log n$$ time using $$\\Theta n \\log n$$ bits of space.
Year
DOI
Venue
2015
10.1007/978-3-319-23826-5_29
SPIRE
Field
DocType
Volume
LCP array,Binary logarithm,Substring,Combinatorics,Suffix,Greedy algorithm,Suffix array,Prefix,Hamming distance,Mathematics
Conference
9309
ISSN
Citations 
PageRank 
0302-9743
5
0.50
References 
Authors
12
1
Name
Order
Citations
PageRank
Giovanni Manzini11584111.42