Title | ||
---|---|---|
A space-efficient alphabet-independent Four-Russians' lookup table and a multithreaded Four-Russians' edit distance algorithm. |
Abstract | ||
---|---|---|
Given two strings X ( | X | = m ) and Y ( | Y | = n ) over an alphabet Σ, the edit distance between X and Y can be computed in O ( m n / t ) time with the help of the Four-Russians' lookup table whose block size is t. The Four-Russians' lookup table can be constructed in O ( ( 3 | Σ | ) 2 t t 2 ) time using O ( ( 3 | Σ | ) 2 t t ) space. However, the construction time and space requirement of the lookup table grow very fast as the alphabet size increases and thus it has been used only when | Σ | is very small. For example, when a string is a protein sequence, | Σ | = 20 and thus it is almost impossible to use the Four-Russians' lookup table on typical workstations. In this paper, we present an efficient alphabet-independent Four-Russians' lookup table. It requires O ( 3 2 t ( 2 t ) ! t ) space and can be constructed in O ( 3 2 t ( 2 t ) ! t 2 ) time. Thus, the Four-Russians' lookup table can be constructed and used irrespective of the alphabet size. The time and space complexity were achieved by compacting the lookup table using a clever encoding of the preprocessed strings. Experimental results show that the space requirement of the lookup table is reduced to about 1/5,172,030 of its original size when | Σ | = 26 and t = 4 . Furthermore, we present efficient multithreaded parallel algorithms for edit distance computation using the Four-Russians' lookup table. The parallel algorithm for lookup table construction runs in O ( t ) time and the parallel algorithm for edit distance computation between X and Y runs in O ( m + n ) time. Experiments performed on CUDA-supported GPU show that our algorithm runs about 942 times faster than the sequential version of the original Four-Russians' algorithm for 100 pairs of random strings of length approximately 1,000 when | Σ | = 4 and t = 4 . |
Year | DOI | Venue |
---|---|---|
2016 | 10.1016/j.tcs.2016.04.028 | Theoretical Computer Science |
Keywords | Field | DocType |
Approximate string matching,Edit distance,Four-Russians' algorithm,Parallelization | Block size,Edit distance,Discrete mathematics,Lookup table,Combinatorics,Parallel algorithm,Spacetime,Approximate string matching,Mathematics,Encoding (memory),Computation | Journal |
Volume | Issue | ISSN |
656 | PB | 0304-3975 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Youngho Kim | 1 | 0 | 0.34 |
Joong Chae Na | 2 | 162 | 18.21 |
Heejin Park | 3 | 235 | 21.63 |
Jeong Seop Sim | 4 | 231 | 20.00 |