Title
Distributed SGD Generalizes Well Under Asynchrony
Abstract
The performance of fully synchronized distributed systems has faced a bottleneck due to the big data trend, under which asynchronous distributed systems are becoming a major popularity due to their powerful scalability. In this paper, we study the generalization performance of stochastic gradient descent (SGD) on a distributed asynchronous system. The system consists of multiple worker machines that compute stochastic gradients which are further sent to and aggregated on a common parameter server to update the variables, and the communication in the system suffers from possible delays. Under the algorithm stability framework, we prove that distributed asynchronous SGD generalizes well given enough data samples in the training optimization. In particular, our results suggest to reduce the learning rate as we allow more asynchrony in the distributed system. Such adaptive learning rate strategy improves the stability of the distributed algorithm and reduces the corresponding generalization error. Then, we confirm our theoretical findings via numerical experiments.
Year
DOI
Venue
2019
10.1109/ALLERTON.2019.8919791
2019 57TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON)
Field
DocType
ISSN
Asynchronous communication,Bottleneck,Synchronization,Stochastic gradient descent,Asynchronous system,Computer science,Server,Distributed algorithm,Scalability,Distributed computing
Conference
2474-0195
Citations 
PageRank 
References 
0
0.34
0
Authors
5
Name
Order
Citations
PageRank
Jayanth Regatti100.34
Gaurav Tendolkar200.34
Yi Zhou36517.55
Abhishek Gupta400.34
Yingbin Liang51646147.64