Momentum-Based Variance Reduction in Non-Convex SGD. - Citegraph

Paper Info

Title
Momentum-Based Variance Reduction in Non-Convex SGD.

Abstract
Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points. However, variance reduction techniques typically require carefully tuned learning rates and willingness to use excessively large "mega-batches" in order to achieve their improved results. We present a new algorithm, STORM, that does not require any batches and makes use of adaptive learning rates, enabling simpler implementation and less hyperparameter tuning. Our technique for removing the batches uses a variant of momentum to achieve variance reduction in non-convex optimization. On smooth losses F, STORM finds a point x with E{vertical bar vertical bar del F(x)vertical bar vertical bar] <= O(1 / root T + sigma(1/3) / T-1/3) in T iterations with sigma(2) variance in the gradients, matching the optimal rate and without requiring knowledge of sigma.

Year	Venue	Keywords
2019	ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019)	convex optimization,nonconvex optimization,variance reduction,convex optimisation
Field	DocType	Volume
Applied mathematics,Stochastic gradient descent,Mathematical optimization,Nabla symbol,Hyperparameter,Regular polygon,Momentum,Critical point (mathematics),Variance reduction,Adaptive learning,Mathematics	Journal	32
ISSN	Citations	PageRank
1049-5258	0	0.34
References	Authors
0	2

Authors (2 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Cutkosky, Ashok	1	14	10.02
Francesco Orabona	2	881	51.44

1