Gradient Descent Happens in a Tiny Subspace. - Citegraph

Paper Info

Title
Gradient Descent Happens in a Tiny Subspace.

Abstract
We show that in a variety of large-scale deep learning scenarios the gradient dynamically converges to a very small subspace after a short period of training. The subspace is spanned by a few top eigenvectors of the Hessian (equal to the number of classes in the dataset), and is mostly preserved over long periods of training. A simple argument then suggests that gradient descent may happen mostly in this subspace. We give an example of this effect in a solvable model of classification, and we comment on possible implications for optimization and learning.

Year	Venue	DocType
2018	arXiv: Learning	Journal
Volume	Citations	PageRank
abs/1812.04754	10	0.69
References	Authors
0	3

Authors (3 rows)

Cited by (10 rows)

References (0 rows)

Name	Order	Citations	PageRank
Guy Gur-Ari	1	10	2.38
Daniel A. Roberts	2	64	4.08
Ethan Dyer	3	10	2.72

1