Title
Gradient Descent Happens in a Tiny Subspace.
Abstract
We show that in a variety of large-scale deep learning scenarios the gradient dynamically converges to a very small subspace after a short period of training. The subspace is spanned by a few top eigenvectors of the Hessian (equal to the number of classes in the dataset), and is mostly preserved over long periods of training. A simple argument then suggests that gradient descent may happen mostly in this subspace. We give an example of this effect in a solvable model of classification, and we comment on possible implications for optimization and learning.
Year
Venue
DocType
2018
arXiv: Learning
Journal
Volume
Citations 
PageRank 
abs/1812.04754
10
0.69
References 
Authors
0
3
Name
Order
Citations
PageRank
Guy Gur-Ari1102.38
Daniel A. Roberts2644.08
Ethan Dyer3102.72