Title
Evolutionary placement of short sequence reads on multi-core architectures
Abstract
The application of high performance computing methods in bioinformatics becomes increasingly important because of the masses of data generated by novel short-read DNA sequencers. One important application of such short reads, is the analysis of microbial communities where the anonymous short reads need to be identified by sequence comparison to a set of reference sequences. This identification is required to analyze the microbial composition and biological diversity of the sample. We briefly introduce a new algorithm for evolutionary (phylogenetic) placement of short reads under the Maximum Likelihood criterion and implement it in RAxML. While this algorithm is significantly more accurate than plain pair-wise sequence comparison it can become highly compute-intensive when a typical number of 100,000 reads and more need to be placed into an existing phylogenetic tree. Therefore, we deploy multi-grain parallelism to improve parallel efficiency of this algorithm on 16-core and 32-core architectures. Via this multi-grain approach, we achieve parallel execution time improvements of 25% and super-linear speedups on 16 cores, as well as near-linear speedups and improvements exceeding 50% on 32-cores on two large real-world microbial datasets. Evolutionary placement of 100,000 reads into a tree with more than 4,000 taxa now only requires less than 2 hours of execution time on 32 cores.
Year
DOI
Venue
2010
10.1109/AICCSA.2010.5586973
Computer Systems and Applications
Keywords
Field
DocType
large real-world microbial datasets,existing phylogenetic tree,execution time,multi-grain approach,evolutionary placement,microbial composition,important application,microbial community,multi-core architecture,multi-grain parallelism,new algorithm,short sequence,biological diversity,synchronization,phylogenetic tree,biology,pthreads,genetics,dna sequence,bioinformatics,maximum likelihood,instruction sets
Synchronization,Phylogenetic tree,Supercomputer,Instruction set,Computer science,Parallel computing,Theoretical computer science,POSIX Threads,Real-time computing,Execution time,Multi-core processor,Maximum likelihood criterion
Conference
ISBN
Citations 
PageRank 
978-1-4244-7716-6
2
0.61
References 
Authors
5
3
Name
Order
Citations
PageRank
Alexandros Stamatakis199596.27
Zsolt Komornik220.61
Simon A. Berger3717.46