Title
PUNAS: A Parallel Ungapped-Alignment-Featured Seed Verification Algorithm for Next-Generation Sequencing Read Alignment
Abstract
The progress of next-generation sequencing has a major impact on medical and genomic research. This technology can now produce billions of short DNA fragments (reads) in a single run. One of the most demanding computational problems used by almost every sequencing pipeline is short-read alignment; i.e. determining where each fragment originated from in the original genome. Most current solutions are based on a seed-and-extend approach, where promising candidate regions (seeds) are first identified and subsequently extended in order to verify whether a full high-scoring alignment actually exists in the vicinity of each seed. Seed verification is the main bottleneck in many state-of-the-art aligners and thus finding fast solutions is of high importance. We present a parallel un gapped-alignment-featured seed verification (PUNAS) algorithm, a fast filter for effectively removing the majority of false positive seeds, thus significantly accelerating the short-read alignment process. PUNAS is based on bit-parallelism and takes advantage of SIMD vector units of modern microprocessors. Our implementation employs a vectorize-and-scale approach supporting multi-core CPUs and many-core Knights Landing (KNL)-based Xeon Phi processors. Performance evaluation reveals that PUNAS is over three orders-of-magnitude faster than seed verification with the Smith-Waterman algorithm and around one order-of-magnitude faster than seed verification with the banded version of Myers bit-vector algorithm. Using a single thread it achieves a speedup of up to 7.3, 27.1, and 11.6 compared to the shifted Hamming distance filter on a SSE, AVX2, and AVX-512 based CPU/KNL, respectively. The speed of our framework further scales almost linearly with the number of cores. PUNAS is open-source software available at https://github.com/Xu-Kai/PUNASfilter.
Year
DOI
Venue
2017
10.1109/IPDPS.2017.35
2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Keywords
Field
DocType
Biological sequence analysis,SIMD,Xeon Phi,Bioinformatics
Bottleneck,Computational problem,Computer science,Xeon Phi,Parallel computing,Algorithm,Field-programmable gate array,SIMD,Thread (computing),Hamming distance,Speedup,Distributed computing
Conference
ISSN
ISBN
Citations 
1530-2075
978-1-5386-3915-3
1
PageRank 
References 
Authors
0.35
1
6
Name
Order
Citations
PageRank
Yuandong Chan1233.90
Kai Xu25620.13
Haidong Lan3273.26
Weiguo Liu4917.15
Yongchao Liu557832.80
Bertil Schmidt669953.00