Title
Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication.
Abstract
We consider decentralized stochastic optimization with the objective function (e.g. data samples for machine learning task) being distributed over $n$ machines that can only communicate to their neighbors on a fixed communication graph. To reduce the communication bottleneck, the nodes compress (e.g. quantize or sparsify) their model updates. We cover both unbiased and biased compression operators with quality denoted by $omega leq 1$ ($omega=1$ meaning no compression). We (i) propose a novel gossip-based stochastic gradient descent algorithm, CHOCO-SGD, that converges at rate $mathcal{O}left(1/(nT) + 1/(T delta^2 omega)^2right)$ for strongly convex objectives, where $T$ denotes the number of iterations and $delta$ the eigengap of the connectivity matrix. Despite compression quality and network connectivity affecting the higher order terms, the first term in the rate, $mathcal{O}(1/(nT))$, is the same as for the centralized baseline with exact communication. We (ii) present a novel gossip algorithm, CHOCO-GOSSIP, for the average consensus problem that converges in time $mathcal{O}(1/(delta^2omega) log (1/epsilon))$ for accuracy $epsilon u003e 0$. This is (up to our knowledge) the first gossip algorithm that supports arbitrary compressed messages for $omega u003e 0$ and still exhibits linear convergence. We (iii) show in experiments that both of our algorithms do outperform the respective state-of-the-art baselines and CHOCO-SGD can reduce communication by at least two orders of magnitudes.
Year
Venue
Field
2019
arXiv: Learning
Stochastic optimization,Gossip algorithms,Computer science,Theoretical computer science,Artificial intelligence,Machine learning
DocType
Volume
Citations 
Journal
abs/1902.00340
3
PageRank 
References 
Authors
0.36
27
3
Name
Order
Citations
PageRank
Anastasia Koloskova131.38
Sebastian U. Stich213516.54
Martin Jaggi385254.16