Title
JVM-Bypass for Efficient Hadoop Shuffling
Abstract
Hadoop employs Java-based network transport stack on top of the Java Virtual Machine (JVM) for its data shuffling and merging purposes. Our examination reveals that JVM introduces a significant amount of overhead to data processing capability of the native interface. Furthermore, JVM constrains the use of high-performance networking mechanisms such as RDMA (Remote Direct Memory Access) which has established itself as an effective data movement technology in many networking environments because of its low-latency, high bandwidth, low CPU utilization, and energy efficiency. In this paper, we introduce a plug-in library called JVM-Bypass Shuffling (JBS) for Hadoop data shuffling. JBS helps Hadoop data shuffling by avoiding Javabased transport protocols, removing the overhead and limitations of the JVM. In addition, we design JBS as a portable library that can leverage both TCP/IP and RDMA on different network systems such as InfiniBand and 1/10 Gigabit Ethernet. We have designed and implemented JBS as part of Hadoop acceleration. It has been transferred to Mellanox as the software product UDA (Unstructured Data Accelerator) and used to enable our studies on a variety of merging algorithms. Our performance evaluation demonstrates that JBS can effectively reduce the execution time of Hadoop jobs by up to 66.3% and lower the CPU utilization by 48.1%.
Year
DOI
Venue
2013
10.1109/IPDPS.2013.13
IPDPS
Keywords
Field
DocType
jvm,high-performance networking mechanism,cpu utilization,parallel processing,high performance interconnect,mellanox,mapreduce,rdma,hadoopdata shuffling,ethernet,different network system,hadoop acceleration,unstructured data accelerator,plug-in library,jvm-bypass shuffling,virtual machines,hadoop data,merging algorithm,java-based network transport stack,java-based network transport,java-based transport protocol,tcp/ip,java virtual machine,hadoop data shuffling,efficient hadoop shuffling,remote direct memory access,hadoop job,effective data movement technology,energy efficiency,java,infiniband,data movement technology,software product uda,bandwidth,tcp ip,merging,protocols
Central processing unit,Virtual machine,InfiniBand,CPU time,Computer science,Parallel computing,Unstructured data,Shuffling,Remote direct memory access,Java,Operating system,Distributed computing
Conference
ISSN
ISBN
Citations 
1530-2075
978-1-4673-6066-1
14
PageRank 
References 
Authors
0.85
14
4
Name
Order
Citations
PageRank
Yandong Wang134218.88
Cong Xu2504.38
Xiaobing Li3140.85
Weikuan Yu4104277.40