Title
How SGD Selects the Global Minima in Over-parameterized Learning - A Dynamical Stability Perspective.
Abstract
The question of which global minima are accessible by a stochastic gradient decent (SGD) algorithm with specific learning rate and batch size is studied from the perspective of dynamical stability. The concept of non-uniformity is introduced, which, together with sharpness, characterizes the stability property of a global minimum and hence the accessibility of a particular SGD algorithm to that global minimum. In particular, this analysis shows that learning rate and batch size play different roles in minima selection. Extensive empirical results seem to correlate well with the theoretical findings and provide further support to these claims.
Year
Venue
Keywords
2018
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018)
global minimum,the concept,learning rate
Field
DocType
Volume
Mathematical optimization,Parameterized complexity,Gradient descent,Computer science,Maxima and minima
Conference
31
ISSN
Citations 
PageRank 
1049-5258
7
0.41
References 
Authors
0
3
Name
Order
Citations
PageRank
Lei Wu15014.69
Chao Ma28527.49
Weinan E337646.45