Global Convergence of Gradient Descent for Deep Linear Residual Networks - Citegraph

Paper Info

Title
Global Convergence of Gradient Descent for Deep Linear Residual Networks

Abstract
We analyze the global convergence of gradient descent for deep linear residual networks by proposing a new initialization: zero-asymmetric (ZAS) initialization. It is motivated by avoiding stable manifolds of saddle points. We prove that under the ZAS initialization, for an arbitrary target matrix, gradient descent converges to an E-optimal point in O (L-3 log(1/epsilon) iterations, which scales polynomially with the network depth L. Our result and the exp(Omega(L)) convergence time for the standard initialization (Xavier or near-identity) [18] together demonstrate the importance of the residual structure and the initialization in the optimization for deep linear neural networks, especially when L is large.

Year	Venue	Keywords
2019	ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019)	neural networks
Field	DocType	Volume
Convergence (routing),Residual,Gradient descent,Mathematical optimization,Computer science	Conference	32
ISSN	Citations	PageRank
1049-5258	0	0.34
References	Authors
0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Wu, Lei	1	0	0.34
Wang, Qingcan	2	0	0.34
Chao Ma	3	85	27.49

1