Abstract | ||
---|---|---|
We analyze the global convergence of gradient descent for deep linear residual networks by proposing a new initialization: zero-asymmetric (ZAS) initialization. It is motivated by avoiding stable manifolds of saddle points. We prove that under the ZAS initialization, for an arbitrary target matrix, gradient descent converges to an E-optimal point in O (L-3 log(1/epsilon) iterations, which scales polynomially with the network depth L. Our result and the exp(Omega(L)) convergence time for the standard initialization (Xavier or near-identity) [18] together demonstrate the importance of the residual structure and the initialization in the optimization for deep linear neural networks, especially when L is large. |
Year | Venue | Keywords |
---|---|---|
2019 | ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | neural networks |
Field | DocType | Volume |
Convergence (routing),Residual,Gradient descent,Mathematical optimization,Computer science | Conference | 32 |
ISSN | Citations | PageRank |
1049-5258 | 0 | 0.34 |
References | Authors | |
0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Wu, Lei | 1 | 0 | 0.34 |
Wang, Qingcan | 2 | 0 | 0.34 |
Chao Ma | 3 | 85 | 27.49 |