Title
A Pareto-based scheduler for exploring cost-performance trade-offs for MapReduce workloads.
Abstract
In recent years, we are observing an increased demand for processing large amounts of data. The MapReduce programming model has been utilized by major computing companies and has been integrated by novel cyber physical systems (CPS) in order to perform large-scale data processing. However, the problem of efficiently scheduling MapReduce workloads in cluster environments, like Amazon’s EC2, can be challenging due to the observed trade-off between the need for performance and the corresponding monetary cost. The problem is exacerbated by the fact that cloud providers tend to charge users based on their I/O operations, increasing dramatically the spending budget. In this paper, we describe our approach for scheduling MapReduce workloads in cluster environments taking into consideration the performance/budget trade-off. Our approach makes the following contributions: (i) we propose a novel Pareto-based scheduler for identifying near-optimal resource allocations for user workloads with respect to performance and monetary cost, and (ii) we develop an automatic configuration of basic tasks’ parameters that allows us to further minimize the user’s spending budget and the jobs’ execution times. Our detailed experimental evaluation using both real and synthetic datasets illustrate that our approach improves the performance of the workloads as much as 50%, compared to its competitors.
Year
DOI
Venue
2017
10.1186/s13639-017-0077-7
EURASIP J. Emb. Sys.
Keywords
Field
DocType
MapReduce, Scheduling, CPS, Big data
Data processing,Programming paradigm,Scheduling (computing),Computer science,Parallel computing,Real-time computing,Cyber-physical system,Resource allocation,Big data,Pareto principle,Distributed computing,Cloud computing
Journal
Volume
ISSN
Citations 
2017
1687-3955
1
PageRank 
References 
Authors
0.36
31
2
Name
Order
Citations
PageRank
Nikos Zacheilas1799.40
Vana Kalogeraki21686124.40