Title
Predicting Job Completion Time in Heterogeneous MapReduce Environments
Abstract
MapReduce is a popular paradigm for processingbig data due to wide availability of its implementations inopen source such as Apache Hadoop. This framework isprimarily designed for optimal execution of an application ona commodity cluster of homogeneous nodes where all machinesin the cluster have identical hardware configurations. However, most organizations have surplus unused heterogeneous infrastructurewhich gets accumulated over a period of time -- thismay have machines with varying number of CPU cores, RAMand disk speed. The question for an organization is whetherit is efficient to set up a heterogeneous Hadoop cluster onavailable set of different types of hardware configured nodes forexecuting their analytic workload. This may help organizationsin reducing their e-waste and cost for data analytic. In thispaper, we propose a simulator based what-if engine to predictjob execution time of a MapReduce based application forvarying size of cluster with varying types of heterogeneity inthe cluster and growing data sizes. The simulator has beenvalidated for three open source MapReduce benchmarks andtwo industrial Hive based analytic workloads on three differentheterogeneous clusters for data sizes up to 100 GB. The largestcluster size considered is of 484 cores, with three types ofhardware nodes. We have observed the average prediction errorto be within 10% of the actual job execution time.
Year
DOI
Venue
2016
10.1109/IPDPSW.2016.10
2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
Keywords
Field
DocType
MapReduce,heterogeneous,simulation,Hive,Prediction,Execution Time,Financial,Telecom
Data modeling,Cluster (physics),Computer science,Workload,Parallel computing,Implementation,Execution time,Big data,Multi-core processor,Benchmark (computing),Distributed computing
Conference
ISSN
ISBN
Citations 
2164-7062
978-1-5090-3683-7
7
PageRank 
References 
Authors
0.63
12
2
Name
Order
Citations
PageRank
Rekha Singhal1168.22
Abhishek Verma270.63