Title
DSA: Scalable Distributed Sequence Alignment System Using SIMD Instructions.
Abstract
Sequence alignment algorithms are a basic and critical component of many bioinformatics fields. With rapid development of sequencing technology, the fast growing reference database volumes and longer length of query sequence become new challenges for sequence alignment. However, the algorithms have prohibitively high time and space complexity. In this paper, we present DSA, a scalable distributed sequence alignment system that employs Apache Spark to process sequences data in a horizontally scalable distributed environment, and leverages data parallel strategy based on Single Instruction Multiple Data (SIMD) instruction to parallelize the algorithms in each core of worker node. The experimental results demonstrate that 1) DSA has outstanding performance and achieves up to 201x speedup over SparkSW. 2) DSA has excellent scalability and achieves near linear speedup when increasing the number of nodes in cluster.
Year
DOI
Venue
2017
10.1109/CCGRID.2017.74
CCGrid
Keywords
DocType
Volume
distributed sequence alignment, Apache Spark, SIMD instruction, Alluxio, Scalability
Conference
abs/1701.01575
ISSN
Citations 
PageRank 
2376-4414
1
0.35
References 
Authors
8
7
Name
Order
Citations
PageRank
Bo Xu1146.71
Changlong Li2266.88
Hang Zhuang3266.54
Jiali Wang493.45
Qingfeng Wang5187.53
Jinhong Zhou693.40
Xuehai Zhou755177.54