Title
Streaming algorithms for embedding and computing edit distance in the low distance regime.
Abstract
The Hamming and the edit metrics are two common notions of measuring distances between pairs of strings x,y lying in the Boolean hypercube. The edit distance between x and y is defined as the minimum number of character insertion, deletion, and bit flips needed for converting x into y. Whereas, the Hamming distance between x and y is the number of bit flips needed for converting x to y. In this paper we study a randomized injective embedding of the edit distance into the Hamming distance with a small distortion. We show a randomized embedding with quadratic distortion. Namely, for any x,y satisfying that their edit distance equals k, the Hamming distance between the embedding of x and y is O(k2) with high probability. This improves over the distortion ratio of O( n*n) obtained by Jowhari (2012) for small values of k. Moreover, the embedding output size is linear in the input size and the embedding can be computed using a single pass over the input. We provide several applications for this embedding. Among our results we provide a one-pass (streaming) algorithm for edit distance running in space O(s) and computing edit distance exactly up-to distance s1/6. This algorithm is based on kernelization for edit distance that is of independent interest.
Year
DOI
Venue
2016
10.1145/2897518.2897577
STOC '16: Symposium on Theory of Computing Cambridge MA USA June, 2016
Keywords
Field
DocType
Edit distance,Hamming distance,randomized embedding,low distortion,string,kernel
Edit distance,String-to-string correction problem,Discrete mathematics,Hamming code,Combinatorics,Embedding,Wagner–Fischer algorithm,Jaro–Winkler distance,Hamming distance,Damerau–Levenshtein distance,Mathematics
Conference
ISSN
ISBN
Citations 
0737-8017
978-1-4503-4132-5
18
PageRank 
References 
Authors
0.70
18
3
Name
Order
Citations
PageRank
Diptarka Chakraborty1318.29
Elazar Goldenberg2181.04
Michal Koucky3776.06