SDAFT: A novel scalable data access framework for parallel BLAST. - Citegraph

Paper Info

Title
SDAFT: A novel scalable data access framework for parallel BLAST.

Abstract
To run search tasks in a parallel and load-balanced fashion, existing parallel BLAST schemes such as mpiBLAST introduce a data initialization preparation stage to move database fragments from the shared storage to local cluster nodes. Unfortunately, a quickly growing sequence database becomes too heavy to move in the network in today's big data era. In this paper, we develop a Scalable Data Access Framework (SDAFT) to solve the problem. It employs a distributed file system (DFS) to provide scalable data access for parallel sequence searches. SDAFT consists of two inter-locked components: 1) a data centric load-balanced scheduler (DC-scheduler) to enforce data-process locality and 2) a translation layer to translate conventional parallel I/O operations into HDFS I/O. By experimenting our SDAFT prototype system with real-world database and queries at a wide variety of computing platforms, we found that SDAFT can reduce I/O cost by a factor of 4 to 10 and double the overall execution performance as compared with existing schemes.

Year	DOI	Venue
2014	10.1145/2534645.2534647	Parallel Computing
Keywords	Field	DocType
data initialization preparation stage,parallel blast scheme,novel scalable data access,sdaft prototype system,conventional parallel,scalable data access,sequence database,database fragment,real-world database,parallel sequence search,big data era	Distributed File System,Database-centric architecture,Locality,Computer science,Parallel computing,Big data,Data access,Scalability,Distributed computing	Journal
Volume	Issue	ISSN
40	10	0167-8191
Citations	PageRank	References
4	0.41	20
Authors
4

Authors (4 rows)

Cited by (4 rows)

References (20 rows)

Name	Order	Citations	PageRank
Jiangling Yin	1	44	9.85
Junyao Zhang	2	16	3.37
Jun Wang	3	420	25.60
Wu-chun Feng	4	2812	232.50

1