Abstract | ||
---|---|---|
In this paper we focus on the problem of finding the optimal weights of the shallowest of neural networks consisting of a single Rectified Linear Unit (ReLU). These functions are of the form x -> max(0, < w, xi) with w is an element of R-d denoting the weight vector. We focus on a planted model where the inputs are chosen i.i.d. from a Gaussian distribution and the labels are generated according to a planted weight vector. We first show that mini-batch stochastic gradient descent when suitably initialized, converges at a geometric rate to the planted model with a number of samples that is optimal up to numerical constants. Next we focus on a parallel implementation where in each iteration the mini-batch gradient is calculated in a distributed manner across multiple processors and then broadcast to a master or all other processors. To reduce the communication cost in this setting we utilize a Quanitzed Stochastic Gradient Scheme (QSGD) where the partial gradients are quantized. Perhaps unexpectedly, we show that QSGD maintains the fast convergence of SGD to a globally optimal model while significantly reducing the communication cost. We further corroborate our numerical findings via various experiments including distributed implementations over Amazon EC2. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/ISIT.2019.8849667 | 2019 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT) |
Field | DocType | Volume |
Convergence (routing),Discrete mathematics,Stochastic gradient descent,Mathematical optimization,Rectifier (neural networks),Weight,Gaussian,Quantization (physics),Artificial neural network,Mathematics,Exponential growth | Journal | abs/1901.06587 |
Citations | PageRank | References |
0 | 0.34 | 15 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Seyed Mohammadreza Mousavi Kalan | 1 | 14 | 1.99 |
Mahdi Soltanolkotabi | 2 | 409 | 25.97 |
Amir Salman Avestimehr | 3 | 1880 | 157.39 |