Title | ||
---|---|---|
Distributed optimization for degenerate loss functions arising from over-parameterization |
Abstract | ||
---|---|---|
We consider distributed optimization with degenerate loss functions, where the optimal sets of local loss functions have a non-empty intersection. This regime often arises in optimizing large-scale multi-agent AI systems (e.g., deep learning systems), where the number of trainable weights far exceeds the number of training samples, leading to highly degenerate loss surfaces. Under appropriate conditions, we prove that distributed gradient descent in this case converges even when communication is arbitrarily less frequent, which is not the case for non-degenerate loss functions. Moreover, we quantitatively analyze the convergence rate, as well as the communication and computation trade-off, providing insights into designing efficient distributed optimization algorithms. Our theoretical findings are confirmed by both distributed convex optimization and deep learning experiments. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1016/j.artint.2021.103575 | Artificial Intelligence |
Keywords | DocType | Volume |
Distributed optimization,Over-parameterization,Deep learning | Journal | 301 |
Issue | ISSN | Citations |
1 | 0004-3702 | 0 |
PageRank | References | Authors |
0.34 | 0 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Chi Zhang | 1 | 6 | 1.78 |
Qianxiao Li | 2 | 0 | 1.01 |