Title
Convergence of SGD in Learning ReLU Models with Separable Data.
Abstract
We consider the binary classification problem in which the objective function is the exponential loss with a ReLU model, and study the convergence property of the stochastic gradient descent (SGD) algorithm on linearly separable data. We show that the gradient descent (GD) algorithm do not always learn desirable model parameters due to the nonlinear ReLU model. Then, we identify a certain condition of data samples, under which we show that SGD can learn a proper classifier with implicit bias. In specific, we establish the sub-linear convergence rate of the function value generated by SGD to global minimum. We further show that SGD actually converges in expectation to the maximum margin classifier with respect to the samples with +1 label under the ReLU model at the rate O(1/ln t). We also extend our study to the case of multi-ReLU neurons, and show that SGD converges to a certain non-linear maximum margin classifier for a class of non-linearly separable data.
Year
Venue
Field
2018
arXiv: Learning
Convergence (routing),Applied mathematics,Computer science,Separable space
DocType
Volume
Citations 
Journal
abs/1806.04339
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Tengyu Xu115.75
Yi Zhou26517.55
Kaiyi Ji3146.58
Yingbin Liang41646147.64