Title
A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning
Abstract
With huge amounts of training data, deep learning has made great breakthroughs in many artificial intelligence (AI) applications. However, such large-scale data sets present computational challenges, requiring training to be distributed on a cluster equipped with accelerators like GPUs. With the fast increase of G PU computing power, the data communications among GPUs have become a potential bottleneck on the overall training performance. In this paper, we first propose a general directed acyclic graph (DAG) model to describe the distributed synchronous stochastic gradient descent (S-SG D) algorithm, which has been widely used in distributed deep learning frameworks. To understand the practical impact of data communications on training performance, we conduct extensive empirical studies on four state-of-the-art distributed deep learning frameworks (i.e., Caffe-MPI, CNTK, MXNet and TensorFlow) over multi-GPU and multi-node environments with different data communication techniques, including PCIe, NVLink, 10GbE, and InfiniBand. Through both analytical and experimental studies, we identify the potential bottlenecks and overheads that could be further optimized. At last, we make the data set of our experimental traces publicly available, which could be used to support simulation-based studies.
Year
DOI
Venue
2018
10.1109/PADSW.2018.8644932
2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS)
Keywords
Field
DocType
Task analysis,Computational modeling,Training,Graphics processing units,Data models,Deep learning,Data communication
Bottleneck,Stochastic gradient descent,Data set,InfiniBand,Computer science,Directed acyclic graph,General-purpose computing on graphics processing units,Artificial intelligence,Deep learning,PCI Express,Distributed computing
Conference
ISSN
ISBN
Citations 
1521-9097
978-1-5386-7308-9
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Shaohuai Shi1414.62
Qiang Wang243666.63
Xiaowen Chu31273101.81
Baochun Li49416614.20