Abstract | ||
---|---|---|
We present PYHESSIAN, a new scalable framework that enables fast computation of Hessian (i.e., second-order derivative) information for deep neural networks. PYHESSIAN enables fast computations of the top Hessian eigenvalues, the Hessian trace, and the full Hessian eigenvalue/spectral density; it supports distributed-memory execution on cloud/supercomputer systems; and it is available as open source [1]. This general framework can be used to analyze neural network models, including the topology of the loss landscape (i.e., curvature information) to gain insight into the behavior of different models/optimizers. As an example, we analyze the effect of residual connections and Batch Normalization layers on the trainability of neural networks. One recent claim, based on simpler first-order analysis, is that residual connections and Batch Normalization make the loss landscape "smoother," thus making it easier for Stochastic Gradient Descent to converge to a good solution. Our second-order analysis, easily enabled by PYHESSIAN, shows new finer-scale insights, demonstrating that while conventional wisdom is sometimes validated, in other cases it is simply incorrect. In particular, we find that Batch Normalization does not necessarily make the loss landscape smoother, especially for shallow networks. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/BigData50022.2020.9378171 | 2020 IEEE International Conference on Big Data (Big Data) |
Keywords | DocType | ISSN |
deep neural networks,Hessian eigenvalues,Hessian trace,distributed-memory execution,open source,neural network models,curvature information,shallow networks,loss landscape smoother,second-order analysis,first-order analysis,Batch Normalization layers,residual connections | Conference | 2639-1589 |
ISBN | Citations | PageRank |
978-1-7281-6252-2 | 2 | 0.37 |
References | Authors | |
11 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zhewei Yao | 1 | 31 | 10.58 |
Amir Gholami | 2 | 66 | 12.99 |
Kurt Keutzer | 3 | 5040 | 801.67 |
Michael W. Mahoney | 4 | 3297 | 218.10 |