Abstract | ||
---|---|---|
As development of high-throughput and low-cost sequencing technologies is leading to massive volumes of genomic data, new solutions for handling data-intensive applications on parallel platforms are urgently required. Particularly, the nature of processing leads to both load balancing and I/O contention challenges. In this paper, we have developed a novel middleware system, RE-PAGE, which allows parallelization of applications that process genomic data with a simple, high-level API. To address load balancing and I/O contention, the features of the middleware include: 1) use of domain-specific information in the formation of data chunks (which can be of non-uniform sizes), 2) replication and placement of each chunk on a small number of nodes, performed in an intelligent way, and 3) scheduling schemes for achieving load balance, when data movement costs out-weigh processing costs and the chunks are of non-uniform sizes. We have evaluated our framework using three genomic applications, which are VarScan, Unified Genotyper, and Coverage Analyzer. We show that our approach leads to better performance than conventional MapReduce scheduling approaches and systems that access data from a centralized store. We also compare against popular frameworks, Hadoop and GATK, and show that our middleware outperforms both, achieving high parallel efficiency and scalability. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1109/CLUSTER.2015.54 | Cluster Computing |
Keywords | Field | DocType |
Parallel Computing,Middleware Systems,Genomic Applications | Middleware,Load management,Middleware (distributed applications),Load balancing (computing),Scheduling (computing),Computer science,Parallel processing,Parallel computing,Real-time computing,Processor scheduling,Distributed computing,Scalability | Conference |
ISSN | Citations | PageRank |
1552-5244 | 0 | 0.34 |
References | Authors | |
27 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
mucahid kutlu | 1 | 38 | 14.16 |
Gagan Agrawal | 2 | 2058 | 209.59 |