Accelerating Gossip Sgd With Periodic Global Averaging - Citegraph

Paper Info

Title
Accelerating Gossip Sgd With Periodic Global Averaging

Abstract
Communication overhead hinders the scalability of large-scale distributed training Gossip SGD, where each node averages only with its neighbors, is more communication-efficient than the prevalent parallel SGD. However, its convergence rate is reversely proportional to quantity 1 - beta which measures the network connectivity. On large and sparse networks where 1 - beta -> 0, Gossip SGD requires more iterations to converge, which offsets against its communication benefit. This paper introduces Gossip-PGA, which adds Periodic Global Averaging into Gossip SGD. Its transient stage, i.e., the iterations required to reach asymptotic linear speedup stage, improves from Omega(beta(4)n(3)/(1-beta)(4)) to Omega(beta(4)n(3)H(4)) for non-convex problems. The influence of network topology in Gossip-PGA can be controlled by the averaging period H. Its transient-stage complexity is also superior to Local SGD which has order Omega(n(3)H(4)). Empirical results of large-scale training on image classification (ResNet50) and language modeling (BERT) validate our theoretical findings.

Year	Venue	DocType
2021	INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139	Conference
Volume	ISSN	Citations
139	2640-3498	0
PageRank	References	Authors
0.34	0	6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Yiming Chen	1	5	1.48
Kun Yuan	2	0	0.68
Yingya Zhang	3	44	1.97
Pan Pan	4	0	0.34
Yinghui Xu	5	172	20.23
Wotao Yin	6	5038	243.92

1