Title
vqSGD: Vector Quantized Stochastic Gradient Descent
Abstract
In this work, we present a family of vector quantization schemes <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">vqSGD</i> (Vector-Quantized Stochastic Gradient Descent) that provide an asymptotic reduction in the communication cost with convergence guarantees in first-order distributed optimization. In the process we derive the following fundamental information theoretic fact: <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\Theta \left({\frac {d}{R^{2}}}\right)$ </tex-math></inline-formula> bits are necessary and sufficient (up to an additive <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$O(\log d)$ </tex-math></inline-formula> term) to describe an unbiased estimator <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\hat{\boldsymbol {g}}(\boldsymbol {g})$ </tex-math></inline-formula> for any <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\boldsymbol {g}$ </tex-math></inline-formula> in the <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$d$ </tex-math></inline-formula> -dimensional unit sphere, under the constraint that <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\| \hat{\boldsymbol {g}}(\boldsymbol {g})\|_{2}\le R$ </tex-math></inline-formula> almost surely, <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$R &gt; 1$ </tex-math></inline-formula> . In particular, we consider a randomized scheme based on the convex hull of a point set, that returns an unbiased estimator of a <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$d$ </tex-math></inline-formula> -dimensional gradient vector with almost surely bounded norm. We provide multiple efficient instances of our scheme, that are near optimal, and require <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$o(d)$ </tex-math></inline-formula> bits of communication at the expense of tolerable increase in error. The instances of our quantization scheme are obtained using well-known families of binary error-correcting codes and provide a smooth tradeoff between the communication and the estimation error of quantization. Furthermore, we show that <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">vqSGD</i> also offers automatic privacy guarantees.
Year
DOI
Venue
2022
10.1109/TIT.2022.3161620
IEEE Transactions on Information Theory
Keywords
DocType
Volume
Vector quantization,communication efficiency,mean estimation,stochastic gradient descent (SGD)
Journal
68
Issue
ISSN
Citations 
7
0018-9448
0
PageRank 
References 
Authors
0.34
11
4
Name
Order
Citations
PageRank
Venkata Gandikota100.34
Daniel M. Kane274361.43
Raj Kumar Maity301.69
Arya Mazumdar430741.81