Title
A space-efficient alphabet-independent Four-Russians' lookup table and a multithreaded Four-Russians' edit distance algorithm.
Abstract
Given two strings X ( | X | = m ) and Y ( | Y | = n ) over an alphabet Σ, the edit distance between X and Y can be computed in O ( m n / t ) time with the help of the Four-Russians' lookup table whose block size is t. The Four-Russians' lookup table can be constructed in O ( ( 3 | Σ | ) 2 t t 2 ) time using O ( ( 3 | Σ | ) 2 t t ) space. However, the construction time and space requirement of the lookup table grow very fast as the alphabet size increases and thus it has been used only when | Σ | is very small. For example, when a string is a protein sequence, | Σ | = 20 and thus it is almost impossible to use the Four-Russians' lookup table on typical workstations. In this paper, we present an efficient alphabet-independent Four-Russians' lookup table. It requires O ( 3 2 t ( 2 t ) ! t ) space and can be constructed in O ( 3 2 t ( 2 t ) ! t 2 ) time. Thus, the Four-Russians' lookup table can be constructed and used irrespective of the alphabet size. The time and space complexity were achieved by compacting the lookup table using a clever encoding of the preprocessed strings. Experimental results show that the space requirement of the lookup table is reduced to about 1/5,172,030 of its original size when | Σ | = 26 and t = 4 . Furthermore, we present efficient multithreaded parallel algorithms for edit distance computation using the Four-Russians' lookup table. The parallel algorithm for lookup table construction runs in O ( t ) time and the parallel algorithm for edit distance computation between X and Y runs in O ( m + n ) time. Experiments performed on CUDA-supported GPU show that our algorithm runs about 942 times faster than the sequential version of the original Four-Russians' algorithm for 100 pairs of random strings of length approximately 1,000 when | Σ | = 4 and t = 4 .
Year
DOI
Venue
2016
10.1016/j.tcs.2016.04.028
Theoretical Computer Science
Keywords
Field
DocType
Approximate string matching,Edit distance,Four-Russians' algorithm,Parallelization
Block size,Edit distance,Discrete mathematics,Lookup table,Combinatorics,Parallel algorithm,Spacetime,Approximate string matching,Mathematics,Encoding (memory),Computation
Journal
Volume
Issue
ISSN
656
PB
0304-3975
Citations 
PageRank 
References 
0
0.34
0
Authors
4
Name
Order
Citations
PageRank
Youngho Kim100.34
Joong Chae Na216218.21
Heejin Park323521.63
Jeong Seop Sim423120.00