Abstract | ||
---|---|---|
In this work, we present a family of vector quantization schemes
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">vqSGD</i>
(Vector-Quantized Stochastic Gradient Descent) that provide an asymptotic reduction in the communication cost with convergence guarantees in first-order distributed optimization. In the process we derive the following fundamental information theoretic fact:
<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\Theta \left({\frac {d}{R^{2}}}\right)$ </tex-math></inline-formula>
bits are necessary and sufficient (up to an additive
<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$O(\log d)$ </tex-math></inline-formula>
term) to describe an unbiased estimator
<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\hat{\boldsymbol {g}}(\boldsymbol {g})$ </tex-math></inline-formula>
for any
<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\boldsymbol {g}$ </tex-math></inline-formula>
in the
<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$d$ </tex-math></inline-formula>
-dimensional unit sphere, under the constraint that
<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\| \hat{\boldsymbol {g}}(\boldsymbol {g})\|_{2}\le R$ </tex-math></inline-formula>
almost surely,
<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$R > 1$ </tex-math></inline-formula>
. In particular, we consider a randomized scheme based on the convex hull of a point set, that returns an unbiased estimator of a
<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$d$ </tex-math></inline-formula>
-dimensional gradient vector with almost surely bounded norm. We provide multiple efficient instances of our scheme, that are near optimal, and require
<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$o(d)$ </tex-math></inline-formula>
bits of communication at the expense of tolerable increase in error. The instances of our quantization scheme are obtained using well-known families of binary error-correcting codes and provide a smooth tradeoff between the communication and the estimation error of quantization. Furthermore, we show that
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">vqSGD</i>
also offers automatic privacy guarantees. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1109/TIT.2022.3161620 | IEEE Transactions on Information Theory |
Keywords | DocType | Volume |
Vector quantization,communication efficiency,mean estimation,stochastic gradient descent (SGD) | Journal | 68 |
Issue | ISSN | Citations |
7 | 0018-9448 | 0 |
PageRank | References | Authors |
0.34 | 11 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Venkata Gandikota | 1 | 0 | 0.34 |
Daniel M. Kane | 2 | 743 | 61.43 |
Raj Kumar Maity | 3 | 0 | 1.69 |
Arya Mazumdar | 4 | 307 | 41.81 |