Title
Power Efficient MapReduce Workload Acceleration Using Integrated-GPU
Abstract
With the pervasiveness of MapReduce - one of the most prominent programming models for data parallelism in Apache Hadoop-, many researchers and developers have spent tremendous effort attempting to boost the computational speed and energy efficiency of MapReduce-based big data processing. However, the scalable and fault-tolerant nature of MapReduce introduces additional costs in disk IO and data transfer, caused by streaming intermediate outputs to disk. In light of these issues, many interesting research projects have been initiated with the goal of improving the compute speed and power efficiency of compute-intensive cloud computing workloads, several with the addition of discrete GPUs. In this work, we present a modified MapReduce approach focused on the iterative clustering algorithms in the Apache Mahout machine learning library that leverage the acceleration potential of the Intel integrated GPU in a multi-node cluster environment. The accelerated framework shows varying levels of speed-up (?45x for Map tasks-only, ?4.37x for the entire K-means clustering) as evaluated using the HiBench benchmark suite. Based on various experiments and in-depth analysis, we find that utilizing the integrated GPU via OpenCL offers significant performance and power efficiency gains over the original CPU based approach. Further analysis is also done to understand the correlations between compute, IO and power efficiency. As such, our results show that embracing the integrated GPU in the Hadoop MapReduce framework represents a promising advance in adding cost and energy efficient compute parallelism to a data parallel multinode environment.
Year
DOI
Venue
2015
10.1109/BigDataService.2015.12
BigDataService
Keywords
Field
DocType
GPGPU,Integrated Graphics,Hadoop,Big Data,Mahout,Machine Learning,OpenCL
Programming paradigm,Computer science,Efficient energy use,Parallel computing,Data parallelism,General-purpose computing on graphics processing units,Cluster analysis,Big data,Scalability,Cloud computing
Conference
Citations 
PageRank 
References 
0
0.34
22
Authors
6
Name
Order
Citations
PageRank
Sungye Kim113811.54
Jeremy Bottleson231.74
Jingyi Jin300.68
Preeti Bindu400.34
Snehal C. Sakhare500.34
Joseph S. Spisak600.34