Abstract | ||
---|---|---|
To run search tasks in a parallel and load-balanced fashion, existing parallel BLAST schemes such as mpiBLAST introduce a data initialization preparation stage to move database fragments from the shared storage to local cluster nodes. Unfortunately, a quickly growing sequence database becomes too heavy to move in the network in today's big data era. In this paper, we develop a Scalable Data Access Framework (SDAFT) to solve the problem. It employs a distributed file system (DFS) to provide scalable data access for parallel sequence searches. SDAFT consists of two inter-locked components: 1) a data centric load-balanced scheduler (DC-scheduler) to enforce data-process locality and 2) a translation layer to translate conventional parallel I/O operations into HDFS I/O. By experimenting our SDAFT prototype system with real-world database and queries at a wide variety of computing platforms, we found that SDAFT can reduce I/O cost by a factor of 4 to 10 and double the overall execution performance as compared with existing schemes. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1145/2534645.2534647 | Parallel Computing |
Keywords | Field | DocType |
data initialization preparation stage,parallel blast scheme,novel scalable data access,sdaft prototype system,conventional parallel,scalable data access,sequence database,database fragment,real-world database,parallel sequence search,big data era | Distributed File System,Database-centric architecture,Locality,Computer science,Parallel computing,Big data,Data access,Scalability,Distributed computing | Journal |
Volume | Issue | ISSN |
40 | 10 | 0167-8191 |
Citations | PageRank | References |
4 | 0.41 | 20 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jiangling Yin | 1 | 44 | 9.85 |
Junyao Zhang | 2 | 16 | 3.37 |
Jun Wang | 3 | 420 | 25.60 |
Wu-chun Feng | 4 | 2812 | 232.50 |