Title
SDAFT: A novel scalable data access framework for parallel BLAST.
Abstract
To run search tasks in a parallel and load-balanced fashion, existing parallel BLAST schemes such as mpiBLAST introduce a data initialization preparation stage to move database fragments from the shared storage to local cluster nodes. Unfortunately, a quickly growing sequence database becomes too heavy to move in the network in today's big data era. In this paper, we develop a Scalable Data Access Framework (SDAFT) to solve the problem. It employs a distributed file system (DFS) to provide scalable data access for parallel sequence searches. SDAFT consists of two inter-locked components: 1) a data centric load-balanced scheduler (DC-scheduler) to enforce data-process locality and 2) a translation layer to translate conventional parallel I/O operations into HDFS I/O. By experimenting our SDAFT prototype system with real-world database and queries at a wide variety of computing platforms, we found that SDAFT can reduce I/O cost by a factor of 4 to 10 and double the overall execution performance as compared with existing schemes.
Year
DOI
Venue
2014
10.1145/2534645.2534647
Parallel Computing
Keywords
Field
DocType
data initialization preparation stage,parallel blast scheme,novel scalable data access,sdaft prototype system,conventional parallel,scalable data access,sequence database,database fragment,real-world database,parallel sequence search,big data era
Distributed File System,Database-centric architecture,Locality,Computer science,Parallel computing,Big data,Data access,Scalability,Distributed computing
Journal
Volume
Issue
ISSN
40
10
0167-8191
Citations 
PageRank 
References 
4
0.41
20
Authors
4
Name
Order
Citations
PageRank
Jiangling Yin1449.85
Junyao Zhang2163.37
Jun Wang342025.60
Wu-chun Feng42812232.50