Title
Fitting Relus Via Sgd And Quantized Sgd
Abstract
In this paper we focus on the problem of finding the optimal weights of the shallowest of neural networks consisting of a single Rectified Linear Unit (ReLU). These functions are of the form x -> max(0, < w, xi) with w is an element of R-d denoting the weight vector. We focus on a planted model where the inputs are chosen i.i.d. from a Gaussian distribution and the labels are generated according to a planted weight vector. We first show that mini-batch stochastic gradient descent when suitably initialized, converges at a geometric rate to the planted model with a number of samples that is optimal up to numerical constants. Next we focus on a parallel implementation where in each iteration the mini-batch gradient is calculated in a distributed manner across multiple processors and then broadcast to a master or all other processors. To reduce the communication cost in this setting we utilize a Quanitzed Stochastic Gradient Scheme (QSGD) where the partial gradients are quantized. Perhaps unexpectedly, we show that QSGD maintains the fast convergence of SGD to a globally optimal model while significantly reducing the communication cost. We further corroborate our numerical findings via various experiments including distributed implementations over Amazon EC2.
Year
DOI
Venue
2019
10.1109/ISIT.2019.8849667
2019 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT)
Field
DocType
Volume
Convergence (routing),Discrete mathematics,Stochastic gradient descent,Mathematical optimization,Rectifier (neural networks),Weight,Gaussian,Quantization (physics),Artificial neural network,Mathematics,Exponential growth
Journal
abs/1901.06587
Citations 
PageRank 
References 
0
0.34
15
Authors
3
Name
Order
Citations
PageRank
Seyed Mohammadreza Mousavi Kalan1141.99
Mahdi Soltanolkotabi240925.97
Amir Salman Avestimehr31880157.39