Title
Delayed Weight Update for Faster Convergence in Data-Parallel Deep Learning.
Abstract
This paper presents a proposal of a data-parallel stochastic gradient descent (SGD) using delayed weight update. A large-scale neural network appears to solve advanced problems, but its processing time increases concomitantly with the network scale. For conventional data parallelism, workers must wait for data communication to and from a server during weight updating. Using the proposed data-parallel method, the network weight has a delay. It is therefore stale. Nevertheless, it gives faster convergence time by hiding the latency of the weight communication for the server. The server concurrently carries out the weight communication and weight update while workers calculate their gradients. The experimentally obtained results demonstrate that, in the proposed data parallel method, the final accuracy converges within degradation of 1.5% compared with the conventional method in both VGG and ResNet At maximum, the convergence speedup factor theoretically reaches double that of conventional data parallelism.
Year
DOI
Venue
2018
10.1109/GlobalSIP.2018.8646456
IEEE Global Conference on Signal and Information Processing
Keywords
Field
DocType
Data synchronous parallelism,Delayed weight update,Distributed learning
Convergence (routing),Stochastic gradient descent,Latency (engineering),Computer science,Parallel computing,Data parallelism,Artificial intelligence,Deep learning,Artificial neural network,Residual neural network,Speedup
Conference
ISSN
Citations 
PageRank 
2376-4066
0
0.34
References 
Authors
0
7
Name
Order
Citations
PageRank
Tetsuya Youkawa100.34
Haruki Mori200.68
Yuki Miyauchi300.68
Kazuki Yamada400.34
Shintaro Izumi58231.56
masahiko yoshimoto611734.06
Hiroshi Kawaguchi73721.08