Title
An Efficient Parallel Algorithm for the Multiple Longest Common Subsequence (MLCS) Problem
Abstract
Finding the multiple longest common subsequence (MLCS) is an important problemin the areas of bioinformatics and computational genomics. Approaches that are more efficient than the standard dynamic programming method have been introduced and successfully parallelized for the special cases of 2 sequences. However, the increasing complexity and size of biological data require an efficient method applicable to an arbitrary number of sequences as well as its efficient parallelization. A recently developed dominant points method for a general MLCS problem has been shown a significant performance improvement over the dynamic programming method, when number of sequences is larger than two. At the same time, the approach has revealed strong demand for its parallelization, in order to be applied to the larger families of sequences or sequences of the greater lengths. In this paper, we introduce an efficient parallel algorithm to find a MLCS for an arbitrary number of sequences, which is based on the dominant points method. When the number of processors is not greater than the size of alphabet multiplied by the number of sequences, the parallel algorithm is estimated to have the asymptotically linear speed up. We experimentally tested the algorithm using sets of randomly generated sequences over different alphabets as well as the protein sequences from a family of homologous proteins. We found that the performance of the algorithm increases with the number of input sequences and reaches a near-linear speedup for eight sequences.
Year
DOI
Venue
2008
10.1109/ICPP.2008.79
ICPP
Keywords
Field
DocType
parallel algorithm,efficient parallel algorithm,algorithm increase,arbitrary number,dominant points method,general mlcs problem,efficient method,dynamic programming method,multiple longest common subsequence,standard dynamic programming method,efficient parallelization,genetics,parallel,longest common subsequence,proteins,biological data,protein sequence,homologous proteins,biology,computational genomics,algorithm design and analysis,parallel algorithms,bioinformatics
Biological data,Dynamic programming,Longest common subsequence problem,Algorithm design,Computer science,Parallel algorithm,Parallel computing,Computational genomics,Speedup,Performance improvement
Conference
Citations 
PageRank 
References 
9
0.54
10
Authors
3
Name
Order
Citations
PageRank
Dmitry Korkin1718.34
Qingguo Wang292.23
Yi Shang3305.25