Delayed Weight Update for Faster Convergence in Data-Parallel Deep Learning. - Citegraph

Paper Info

Title
Delayed Weight Update for Faster Convergence in Data-Parallel Deep Learning.

Abstract
This paper presents a proposal of a data-parallel stochastic gradient descent (SGD) using delayed weight update. A large-scale neural network appears to solve advanced problems, but its processing time increases concomitantly with the network scale. For conventional data parallelism, workers must wait for data communication to and from a server during weight updating. Using the proposed data-parallel method, the network weight has a delay. It is therefore stale. Nevertheless, it gives faster convergence time by hiding the latency of the weight communication for the server. The server concurrently carries out the weight communication and weight update while workers calculate their gradients. The experimentally obtained results demonstrate that, in the proposed data parallel method, the final accuracy converges within degradation of 1.5% compared with the conventional method in both VGG and ResNet At maximum, the convergence speedup factor theoretically reaches double that of conventional data parallelism.

Year	DOI	Venue
2018	10.1109/GlobalSIP.2018.8646456	IEEE Global Conference on Signal and Information Processing
Keywords	Field	DocType
Data synchronous parallelism,Delayed weight update,Distributed learning	Convergence (routing),Stochastic gradient descent,Latency (engineering),Computer science,Parallel computing,Data parallelism,Artificial intelligence,Deep learning,Artificial neural network,Residual neural network,Speedup	Conference
ISSN	Citations	PageRank
2376-4066	0	0.34
References	Authors
0	7

Authors (7 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Tetsuya Youkawa	1	0	0.34
Haruki Mori	2	0	0.68
Yuki Miyauchi	3	0	0.68
Kazuki Yamada	4	0	0.34
Shintaro Izumi	5	82	31.56
masahiko yoshimoto	6	117	34.06
Hiroshi Kawaguchi	7	37	21.08

1