Title | ||
---|---|---|
PUNAS: A Parallel Ungapped-Alignment-Featured Seed Verification Algorithm for Next-Generation Sequencing Read Alignment |
Abstract | ||
---|---|---|
The progress of next-generation sequencing has a major impact on medical and genomic research. This technology can now produce billions of short DNA fragments (reads) in a single run. One of the most demanding computational problems used by almost every sequencing pipeline is short-read alignment; i.e. determining where each fragment originated from in the original genome. Most current solutions are based on a seed-and-extend approach, where promising candidate regions (seeds) are first identified and subsequently extended in order to verify whether a full high-scoring alignment actually exists in the vicinity of each seed. Seed verification is the main bottleneck in many state-of-the-art aligners and thus finding fast solutions is of high importance. We present a parallel un gapped-alignment-featured seed verification (PUNAS) algorithm, a fast filter for effectively removing the majority of false positive seeds, thus significantly accelerating the short-read alignment process. PUNAS is based on bit-parallelism and takes advantage of SIMD vector units of modern microprocessors. Our implementation employs a vectorize-and-scale approach supporting multi-core CPUs and many-core Knights Landing (KNL)-based Xeon Phi processors. Performance evaluation reveals that PUNAS is over three orders-of-magnitude faster than seed verification with the Smith-Waterman algorithm and around one order-of-magnitude faster than seed verification with the banded version of Myers bit-vector algorithm. Using a single thread it achieves a speedup of up to 7.3, 27.1, and 11.6 compared to the shifted Hamming distance filter on a SSE, AVX2, and AVX-512 based CPU/KNL, respectively. The speed of our framework further scales almost linearly with the number of cores. PUNAS is open-source software available at https://github.com/Xu-Kai/PUNASfilter. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1109/IPDPS.2017.35 | 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) |
Keywords | Field | DocType |
Biological sequence analysis,SIMD,Xeon Phi,Bioinformatics | Bottleneck,Computational problem,Computer science,Xeon Phi,Parallel computing,Algorithm,Field-programmable gate array,SIMD,Thread (computing),Hamming distance,Speedup,Distributed computing | Conference |
ISSN | ISBN | Citations |
1530-2075 | 978-1-5386-3915-3 | 1 |
PageRank | References | Authors |
0.35 | 1 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yuandong Chan | 1 | 23 | 3.90 |
Kai Xu | 2 | 56 | 20.13 |
Haidong Lan | 3 | 27 | 3.26 |
Weiguo Liu | 4 | 91 | 7.15 |
Yongchao Liu | 5 | 578 | 32.80 |
Bertil Schmidt | 6 | 699 | 53.00 |