Title
A novel fast multiple nucleotide sequence alignment method based on FM-index
Abstract
Multiple sequence alignment (MSA) is fundamental to many biological applications. But most classical MSA algorithms are difficult to handle large-scale multiple sequences, especially long sequences. Therefore, some recent aligners adopt an efficient divide-and-conquer strategy to divide long sequences into several short sub-sequences. Selecting the common segments (i.e. anchors) for division of sequences is very critical as it directly affects the accuracy and time cost. So, we proposed a novel algorithm, FMAlign, to improve the performance of multiple nucleotide sequence alignment. We use FM-index to extract long common segments at a low cost rather than using a space-consuming hash table. Moreover, after finding the longer optimal common segments, the sequences are divided by the longer common segments. FMAlign has been tested on virus and bacteria genome and human mitochondrial genome datasets, and compared with existing MSA methods such as MAFFT, HAlign and FAME. The experiments show that our method outperforms the existing methods in terms of running time, and has a high accuracy on long sequence sets. All the results demonstrate that our method is applicable to the large-scale nucleotide sequences in terms of sequence length and sequence number. The source code and related data are accessible in https://github.com/iliuh/FMAlign.
Year
DOI
Venue
2022
10.1093/bib/bbab519
BRIEFINGS IN BIOINFORMATICS
Keywords
DocType
Volume
multiple sequence alignment, divide-and-conquer strategy, FM-index, common segments, dividing sequences
Journal
23
Issue
ISSN
Citations 
1
1467-5463
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Huan Liu100.34
quan zou255867.61
Yun Xu316719.13