Title
CooMR: cross-task coordination for efficient data management in MapReduce programs
Abstract
Hadoop is a widely adopted open source implementation of MapReduce programming model for big data processing. It represents system resources as available map and reduce slots and assigns them to various tasks. This execution model gives little regard to the need of cross-task coordination on the use of shared system resources on a compute node, which results in task interference. In addition, the existing Hadoop merge algorithm can cause excessive I/O. In this study, we undertake an effort to address both issues. Accordingly, we have designed a cross-task coordination framework called CooMR for efficient data management in MapReduce programs. CooMR consists of three component schemes including cross-task opportunistic memory sharing and log-structured I/O consolidation, which are designed to facilitate task coordination, and the key-based in-situ merge (KISM) algorithm which is designed to enable the sorting/merging of Hadoop intermediate data without actually moving the pairs. Our evaluation demonstrates that CooMR is able to increase task coordination, improve system resource utilization, and significantly speed up the execution time of MapReduce programs.
Year
DOI
Venue
2013
10.1145/2503210.2503276
SC
Keywords
Field
DocType
efficient data management,hadoop intermediate data,big data processing,cross-task opportunistic memory sharing,task coordination,shared system resource,cross-task coordination framework,mapreduce programming model,cross-task coordination,mapreduce program,resource management,interference
Merge algorithm,Resource management,Resource (disambiguation),Programming paradigm,Computer science,Parallel computing,Sorting,Execution model,Data management,Distributed computing,Speedup
Conference
Citations 
PageRank 
References 
7
0.50
18
Authors
5
Name
Order
Citations
PageRank
Xiaobing Li1100.90
Yandong Wang234218.88
Yizheng Jiao3452.30
Cong Xu4504.38
Weikuan Yu5104277.40