Title
CLAST: CUDA implemented large-scale alignment search tool.
Abstract
Metagenomics is a powerful methodology to study microbial communities, but it is highly dependent on nucleotide sequence similarity searching against sequence databases. Metagenomic analyses with next-generation sequencing technologies produce enormous numbers of reads from microbial communities, and many reads are derived from microbes whose genomes have not yet been sequenced, limiting the usefulness of existing sequence similarity search tools. Therefore, there is a clear need for a sequence similarity search tool that can rapidly detect weak similarity in large datasets.We developed a tool, which we named CLAST (CUDA implemented large-scale alignment search tool), that enables analyses of millions of reads and thousands of reference genome sequences, and runs on NVIDIA Fermi architecture graphics processing units. CLAST has four main advantages over existing alignment tools. First, CLAST was capable of identifying sequence similarities ~80.8 times faster than BLAST and 9.6 times faster than BLAT. Second, CLAST executes global alignment as the default (local alignment is also an option), enabling CLAST to assign reads to taxonomic and functional groups based on evolutionarily distant nucleotide sequences with high accuracy. Third, CLAST does not need a preprocessed sequence database like Burrows-Wheeler Transform-based tools, and this enables CLAST to incorporate large, frequently updated sequence databases. Fourth, CLAST requires <2 GB of main memory, making it possible to run CLAST on a standard desktop computer or server node.CLAST achieved very high speed (similar to the Burrows-Wheeler Transform-based Bowtie 2 for long reads) and sensitivity (equal to BLAST, BLAT, and FR-HIT) without the need for extensive database preprocessing or a specialized computing platform. Our results demonstrate that CLAST has the potential to be one of the most powerful and realistic approaches to analyze the massive amount of sequence data from next-generation sequencing technologies.
Year
DOI
Venue
2014
10.1186/s12859-014-0406-y
BMC Bioinformatics
Keywords
Field
DocType
algorithms,bioinformatics,microarrays,biomedical research
Sequence alignment,Genome,Data mining,CUDA,Nucleic acid sequence,Computer science,Metagenomics,Bioinformatics,Nearest neighbor search,Limiting,DNA microarray
Journal
Volume
Issue
ISSN
15
1
1471-2105
Citations 
PageRank 
References 
9
0.44
10
Authors
5
Name
Order
Citations
PageRank
Masahiro Yano1100.79
Hiroshi Mori2222.56
Yutaka Akiyama317237.62
Yamada, T.45917.08
Ken Kurokawa5222.22