Abstract | ||
---|---|---|
We introduce a fast algorithm for entrywise evaluation of the Gauss-Newton Hessian (GNH) matrix for the fully connected feed-forward neural network. The algorithm has a precomputation step and a sampling step. While it generally requires O(Nn) work to compute an entry (and the entire column) in the GNH matrix for a neural network with N parameters and n data points, our fast sampling algorithm reduces the cost to O(n+ d/ is an element of(2)) work, where d is the output dimension of the network and epsilon is a prescribed accuracy (independent of N). One application of our algorithm is constructing the hierarchical-matrix (H -matrix) approximation of the GNH matrix for solving linear systems and eigenvalue problems. It generally requires O (N-2) memory and O(N-3) work to store and factorize the GNH matrix, respectively. The H-matrix approximation requires only O(Nr(o)) memory footprint and O (Nr(o)(2)) work to be factorized, where r(o) << N is the maximum rank of off-diagonal blocks in the GNH matrix. We demonstrate the performance of our fast algorithm and the H -matrix approximation on classification and autoencoder neural networks. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1137/19M129961X | SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS |
Keywords | DocType | Volume |
Gauss-Newton Hessian, fast Monte Carlo sampling, hierarchical matrix, second-order optimization, multilayer perceptron | Journal | 42 |
Issue | ISSN | Citations |
1 | 0895-4798 | 1 |
PageRank | References | Authors |
0.35 | 0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Chao Chen | 1 | 2032 | 185.26 |
Severin Reiz | 2 | 1 | 0.35 |
Chenhan D. Yu | 3 | 1 | 0.35 |
Hans-Joachim Bungartz | 4 | 1 | 0.35 |
George Biros | 5 | 938 | 77.86 |