Abstract | ||
---|---|---|
We show that in a variety of large-scale deep learning scenarios the gradient dynamically converges to a very small subspace after a short period of training. The subspace is spanned by a few top eigenvectors of the Hessian (equal to the number of classes in the dataset), and is mostly preserved over long periods of training. A simple argument then suggests that gradient descent may happen mostly in this subspace. We give an example of this effect in a solvable model of classification, and we comment on possible implications for optimization and learning. |
Year | Venue | DocType |
---|---|---|
2018 | arXiv: Learning | Journal |
Volume | Citations | PageRank |
abs/1812.04754 | 10 | 0.69 |
References | Authors | |
0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Guy Gur-Ari | 1 | 10 | 2.38 |
Daniel A. Roberts | 2 | 64 | 4.08 |
Ethan Dyer | 3 | 10 | 2.72 |