Accelerating TensorFlow with Adaptive RDMA-Based gRPC. - Citegraph

Paper Info

Title
Accelerating TensorFlow with Adaptive RDMA-Based gRPC.

Abstract
Google's TensorFlow is one of the most popular Deep Learning frameworks nowadays. Distributed TensorFlow supports various channels to efficiently transfer tensors, such as gRPC over TCP/IP, gRPC+Verbs, and gRPC+MPI. At present, the community lacks a thorough characterization of distributed TensorFlow communication channels. This is critical because high-performance Deep Learning with TensorFlow needs an efficient communication runtime. Thus, we conduct a thorough analysis of the communication characteristics of distributed TensorFlow. Our studies show that none of the existing channels in TensorFlow can support adaptive and efficient communication for Deep Learning workloads with different message sizes. Moreover, the community needs to maintain these different channels while the users are also expected to tune these channels to get the desired performance. Therefore, this paper proposes a unified approach to have a single gRPC runtime (i.e., AR-gRPC) in TensorFlow with Adaptive and efficient RDMA protocols. In AR-gRPC, we propose designs such as hybrid communication protocols, message pipelining and coalescing, zero-copy transmission etc. to make our runtime be adaptive to different message sizes for Deep Learning workloads. Our performance evaluations show that AR-gRPC can significantly speedup gRPC performance by up to 4.1x and 2.3x compared to the default gRPC design on IPoIB and another RDMA-based gRPC design in the community. Comet supercomputer shows that AR-gRPC design can reduce the Point-to-Point latency by up to 75% compared to the default gRPC design. By integrating our AR-gRPC with TensorFlow, we can achieve up to 3x distributed training speedup over default gRPC-IPoIB based TensorFlow.

Year	DOI	Venue
2018	10.1109/HiPC.2018.00010	HiPC
Keywords	Field	DocType
Payloads,Protocols,Deep learning,Training,Runtime,Servers	Pipeline (computing),Supercomputer,Computer science,Server,Communication channel,Artificial intelligence,Remote direct memory access,Deep learning,Speedup,Distributed computing,Communications protocol	Conference
ISSN	ISBN	Citations
1094-7256	978-1-5386-8386-6	2
PageRank	References	Authors
0.50	0	3

Authors (3 rows)

Cited by (2 rows)

References (0 rows)

Name	Order	Citations	PageRank
Rajarshi Biswas	1	2	1.17
Xiaoyi Lu	2	602	60.53
Dhabaleswar K. Panda	3	5366	446.70

1