Title
Joint scheduling of MapReduce jobs with servers: Performance bounds and experiments
Abstract
MapReduce has achieved tremendous success for large-scale data processing in data centers. A key feature distinguishing MapReduce from previous parallel models is that it interleaves parallel and sequential computation. Past schemes, and especially their theoretical bounds, on general parallel models are therefore, unlikely to be applied to MapReduce directly. There are many recent studies on MapReduce job and task scheduling. These studies assume that the servers are assigned in advance. In current data centers, multiple MapReduce jobs of different importance levels run together. In this paper, we investigate a schedule problem for MapReduce taking server assignment into consideration as well. We formulate a MapReduce server-job organizer problem (MSJO) and show that it is NP-complete. We develop a 3-approximation algorithm and a fast heuristic. We evaluate our algorithms through both simulations and experiments on Amazon EC2 with an implementation in Hadoop. The results confirm the advantage of our algorithms.
Year
DOI
Venue
2014
10.1109/INFOCOM.2014.6848160
INFOCOM
Keywords
Field
DocType
large-scale data processing,mapreduce server-job organizer problem,amazon ec2,server assignment,joint scheduling,scheduling,approximation theory,parallel programming,parallel models,parallel computation,np-complete problem,3-approximation algorithm,computational complexity,data centers,sequential computation,msjo,task scheduling,hadoop,job scheduling,fast heuristic
Multiprocessor scheduling,Fair-share scheduling,Computer science,Parallel computing,Flow shop scheduling,Server,Two-level scheduling,Rate-monotonic scheduling,Earliest deadline first scheduling,Dynamic priority scheduling,Distributed computing
Conference
ISSN
Citations 
PageRank 
0743-166X
14
0.59
References 
Authors
14
3
Name
Order
Citations
PageRank
Jiahai Yang120053.58
Dan Wang216913.41
Jiangchuan Liu34340310.86