Title
Benchmarking Harp-DAAL: High Performance Hadoop on KNL Clusters
Abstract
Data analytics is undergoing a revolution in many scientific domains, and demands cost-effective parallel data analysis techniques. Traditional Java-based Big Data processing tools like Hadoop MapReduce are designed for commodity CPUs. In contrast, emerging manycore processors like the Xeon Phi have an order of magnitude greater computation power and memory bandwidth. To harness their computing capabilities, we propose the Harp-DAAL framework. We show that enhanced versions of MapReduce can be replaced by Harp, a Hadoop plug-in, that offers useful data abstractions for both high-performance iterative computation and MPI-quality communication, as well as drive Intel's native DAAL library. We select a subset of three machine learning algorithms and implement them within Harp-DAAL. Our scalability benchmarks ran on Knights Landing (KNL) clusters and achieved up to 2.5 times speedup of performance over the HPC solution in NOMAD and 15 to 40 times speedup over Java-based solutions in Spark. We further quantify the workloads on single node KNL with a performance breakdown at the micro-architecture level.
Year
DOI
Venue
2017
10.1109/CLOUD.2017.19
2017 IEEE 10th International Conference on Cloud Computing (CLOUD)
Keywords
Field
DocType
HPC,Xeon Phi,BigData
Data structure,Memory bandwidth,Spark (mathematics),Computer science,Xeon Phi,Parallel computing,Java,Big data,Operating system,Scalability,Speedup
Conference
ISSN
ISBN
Citations 
2159-6182
978-1-5386-1994-0
1
PageRank 
References 
Authors
0.35
9
13
Name
Order
Citations
PageRank
Langshi Chen120.69
Bo Peng292.91
Bingjing Zhang352125.17
Bingjing Zhang452125.17
Tony Liu510.35
Yiming Zou610.35
Lei Jiang7121.69
Robert Henschel810610.85
Craig A. Stewart925942.68
Emily McCallum1010.35
Zahniser Tom1110.35
Omer Jon1210.35
Judy Qiu1332.07