Title
PyHessian: Neural Networks Through the Lens of the Hessian
Abstract
We present PYHESSIAN, a new scalable framework that enables fast computation of Hessian (i.e., second-order derivative) information for deep neural networks. PYHESSIAN enables fast computations of the top Hessian eigenvalues, the Hessian trace, and the full Hessian eigenvalue/spectral density; it supports distributed-memory execution on cloud/supercomputer systems; and it is available as open source [1]. This general framework can be used to analyze neural network models, including the topology of the loss landscape (i.e., curvature information) to gain insight into the behavior of different models/optimizers. As an example, we analyze the effect of residual connections and Batch Normalization layers on the trainability of neural networks. One recent claim, based on simpler first-order analysis, is that residual connections and Batch Normalization make the loss landscape "smoother," thus making it easier for Stochastic Gradient Descent to converge to a good solution. Our second-order analysis, easily enabled by PYHESSIAN, shows new finer-scale insights, demonstrating that while conventional wisdom is sometimes validated, in other cases it is simply incorrect. In particular, we find that Batch Normalization does not necessarily make the loss landscape smoother, especially for shallow networks.
Year
DOI
Venue
2020
10.1109/BigData50022.2020.9378171
2020 IEEE International Conference on Big Data (Big Data)
Keywords
DocType
ISSN
deep neural networks,Hessian eigenvalues,Hessian trace,distributed-memory execution,open source,neural network models,curvature information,shallow networks,loss landscape smoother,second-order analysis,first-order analysis,Batch Normalization layers,residual connections
Conference
2639-1589
ISBN
Citations 
PageRank 
978-1-7281-6252-2
2
0.37
References 
Authors
11
4
Name
Order
Citations
PageRank
Zhewei Yao13110.58
Amir Gholami26612.99
Kurt Keutzer35040801.67
Michael W. Mahoney43297218.10