Title
Characterizing and benchmarking stand-alone Hadoop MapReduce on modern HPC clusters.
Abstract
With the emergence of high-performance data analytics, the Hadoop platform is being increasingly used to process data stored on high-performance computing clusters. While there is immense scope for improving the performance of Hadoop MapReduce (including the network-intensive shuffle phase) over these modern clusters, that are equipped with high-speed interconnects such as InfiniBand and 10/40 GigE, and storage systems such as SSDs and Lustre, it is essential to study the MapReduce component in an isolated manner. In this paper, we study popular MapReduce workloads, obtained from well-accepted, comprehensive benchmark suites, to identify common shuffle data distribution patterns. We determine different environmental and workload-specific factors that affect the performance of the MapReduce job. Based on these characterization studies, we propose a micro-benchmark suite that can be used to evaluate the performance of stand-alone Hadoop MapReduce, and demonstrate its ease-of-use with different networks/protocols, Hadoop distributions, and storage architectures. Performance evaluations with our proposed micro-benchmarks show that stand-alone Hadoop MapReduce over IPoIB performs better than 10 GigE by about 13–15 %, and the RDMA-enhanced hybrid MapReduce design can achieve up to 43 % performance improvement over default Hadoop MapReduce over IPoIB, in both shared-nothing and shared storage architectures.
Year
DOI
Venue
2016
https://doi.org/10.1007/s11227-016-1760-5
The Journal of Supercomputing
Keywords
Field
DocType
Big Data,Hadoop MapReduce,Micro-benchmarks,High-performance networks,RDMA,InfiniBand
Suite,InfiniBand,Data analysis,Computer science,Parallel computing,Remote direct memory access,Lustre (mineralogy),Big data,Benchmarking,Operating system,Performance improvement
Journal
Volume
Issue
ISSN
72
12
0920-8542
Citations 
PageRank 
References 
1
0.39
22
Authors
5
Name
Order
Citations
PageRank
Dipti Shankar112010.71
Xiaoyi Lu260260.53
Md. Wasi-ur-Rahman341226.84
Nusrat S. Islam422914.08
Dhabaleswar K. Panda55366446.70