Equilibrated adaptive learning rates for non-convex optimization - Citegraph

Paper Info

Title
Equilibrated adaptive learning rates for non-convex optimization

Abstract
Parameter-specific adaptive learning rate methods are computationally efficient ways to reduce the ill-conditioning problems encountered when training large deep networks. Following recent work that strongly suggests that most of the critical points encountered when training such networks are saddle points, we find how considering the presence of negative eigenvalues of the Hessian could help us design better suited adaptive learning rate schemes. We show that the popular Jacobi preconditioner has undesirable behavior in the presence of both positive and negative curvature, and present theoretical and empirical evidence that the so-called equilibration preconditioner is comparatively better suited to non-convex problems. We introduce a novel adaptive learning rate scheme, called ESGD, based on the equilibration preconditioner. Our experiments show that ESGD performs as well or better than RMSProp in terms of convergence speed, always clearly improving over plain stochastic gradient descent.

Year	Venue	Field
2015	Annual Conference on Neural Information Processing Systems	Convergence (routing),Mathematical optimization,Stochastic gradient descent,Saddle point,Preconditioner,Computer science,Hessian matrix,Critical point (mathematics),Adaptive learning,Eigenvalues and eigenvectors
DocType	Volume	ISSN
Conference	28	1049-5258
Citations	PageRank	References
52	2.49	9
Authors
3

Authors (3 rows)

Cited by (52 rows)

References (9 rows)

Name	Order	Citations	PageRank
Dauphin, Yann N.	1	979	49.26
Harm de Vries	2	239	12.50
Yoshua Bengio	3	42677	3039.83

1