Title
Global Convergence of Gradient Descent for Deep Linear Residual Networks
Abstract
We analyze the global convergence of gradient descent for deep linear residual networks by proposing a new initialization: zero-asymmetric (ZAS) initialization. It is motivated by avoiding stable manifolds of saddle points. We prove that under the ZAS initialization, for an arbitrary target matrix, gradient descent converges to an E-optimal point in O (L-3 log(1/epsilon) iterations, which scales polynomially with the network depth L. Our result and the exp(Omega(L)) convergence time for the standard initialization (Xavier or near-identity) [18] together demonstrate the importance of the residual structure and the initialization in the optimization for deep linear neural networks, especially when L is large.
Year
Venue
Keywords
2019
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019)
neural networks
Field
DocType
Volume
Convergence (routing),Residual,Gradient descent,Mathematical optimization,Computer science
Conference
32
ISSN
Citations 
PageRank 
1049-5258
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Wu, Lei100.34
Wang, Qingcan200.34
Chao Ma38527.49