Scalable Adaptive Stochastic Optimization Using Random Projections. - Citegraph

Paper Info

Title
Scalable Adaptive Stochastic Optimization Using Random Projections.

Abstract
Adaptive stochastic gradient methods such as ADAGRAD have gained popularity in particular for training deep neural networks. The most commonly used and studied variant maintains a diagonal matrix approximation to second order information by accumulating past gradients which are used to tune the step size adaptively. In certain situations the full-matrix variant of ADAGRAD is expected to attain better performance, however in high dimensions it is computationally impractical. We present ADA-LR and RADAGRAD two computationally efficient approximations to full-matrix ADAGRAD based on randomized dimensionality reduction. They are able to capture dependencies between features and achieve similar performance to full-matrix ADAGRAD but at a much smaller computational cost. We show that the regret of ADA-LR is close to the regret of full-matrix ADAGRAD which can have an up-to exponentially smaller dependence on the dimension than the diagonal variant. Empirically, we show that ADA-LR and RADAGRAD perform similarly to full-matrix ADAGRAD. On the task of training convolutional neural networks as well as recurrent neural networks, RADAGRAD achieves faster convergence than diagonal ADAGRAD.

Year	Venue	DocType
2016	ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016)	Conference
Volume	ISSN	Citations
29	1049-5258	1
PageRank	References	Authors
0.35	0	5

Authors (5 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Krummenacher, Gabriel	1	9	0.97
McWilliams, Brian	2	105	5.90
Yannic Kilcher	3	8	4.28
joachim m buhmann	4	4363	730.34
Nicolai Meinshausen	5	8	2.55

1