Title
Accelerating TensorFlow with Adaptive RDMA-Based gRPC.
Abstract
Google's TensorFlow is one of the most popular Deep Learning frameworks nowadays. Distributed TensorFlow supports various channels to efficiently transfer tensors, such as gRPC over TCP/IP, gRPC+Verbs, and gRPC+MPI. At present, the community lacks a thorough characterization of distributed TensorFlow communication channels. This is critical because high-performance Deep Learning with TensorFlow needs an efficient communication runtime. Thus, we conduct a thorough analysis of the communication characteristics of distributed TensorFlow. Our studies show that none of the existing channels in TensorFlow can support adaptive and efficient communication for Deep Learning workloads with different message sizes. Moreover, the community needs to maintain these different channels while the users are also expected to tune these channels to get the desired performance. Therefore, this paper proposes a unified approach to have a single gRPC runtime (i.e., AR-gRPC) in TensorFlow with Adaptive and efficient RDMA protocols. In AR-gRPC, we propose designs such as hybrid communication protocols, message pipelining and coalescing, zero-copy transmission etc. to make our runtime be adaptive to different message sizes for Deep Learning workloads. Our performance evaluations show that AR-gRPC can significantly speedup gRPC performance by up to 4.1x and 2.3x compared to the default gRPC design on IPoIB and another RDMA-based gRPC design in the community. Comet supercomputer shows that AR-gRPC design can reduce the Point-to-Point latency by up to 75% compared to the default gRPC design. By integrating our AR-gRPC with TensorFlow, we can achieve up to 3x distributed training speedup over default gRPC-IPoIB based TensorFlow.
Year
DOI
Venue
2018
10.1109/HiPC.2018.00010
HiPC
Keywords
Field
DocType
Payloads,Protocols,Deep learning,Training,Runtime,Servers
Pipeline (computing),Supercomputer,Computer science,Server,Communication channel,Artificial intelligence,Remote direct memory access,Deep learning,Speedup,Distributed computing,Communications protocol
Conference
ISSN
ISBN
Citations 
1094-7256
978-1-5386-8386-6
2
PageRank 
References 
Authors
0.50
0
3
Name
Order
Citations
PageRank
Rajarshi Biswas121.17
Xiaoyi Lu260260.53
Dhabaleswar K. Panda35366446.70