Title
Distributed optimization for degenerate loss functions arising from over-parameterization
Abstract
We consider distributed optimization with degenerate loss functions, where the optimal sets of local loss functions have a non-empty intersection. This regime often arises in optimizing large-scale multi-agent AI systems (e.g., deep learning systems), where the number of trainable weights far exceeds the number of training samples, leading to highly degenerate loss surfaces. Under appropriate conditions, we prove that distributed gradient descent in this case converges even when communication is arbitrarily less frequent, which is not the case for non-degenerate loss functions. Moreover, we quantitatively analyze the convergence rate, as well as the communication and computation trade-off, providing insights into designing efficient distributed optimization algorithms. Our theoretical findings are confirmed by both distributed convex optimization and deep learning experiments.
Year
DOI
Venue
2021
10.1016/j.artint.2021.103575
Artificial Intelligence
Keywords
DocType
Volume
Distributed optimization,Over-parameterization,Deep learning
Journal
301
Issue
ISSN
Citations 
1
0004-3702
0
PageRank 
References 
Authors
0.34
0
2
Name
Order
Citations
PageRank
Chi Zhang161.78
Qianxiao Li201.01