Title | ||
---|---|---|
The Loss Surface of Residual Networks: Ensembles and the Role of Batch Normalization. |
Abstract | ||
---|---|---|
Deep Residual Networks present a premium in performance in comparison to conventionalnetworks of the same depth and are trainable at extreme depths. It hasrecently been shown that Residual Networks behave like ensembles of relativelyshallow networks. We show that these ensemble are dynamic: while initiallythe virtual ensemble is mostly at depths lower than half the network’s depth, astraining progresses, it becomes deeper and deeper. The main mechanism that controlsthe dynamic ensemble behavior is the scaling introduced, e.g., by the BatchNormalization technique. We explain this behavior and demonstrate the drivingforce behind it. As a main tool in our analysis, we employ generalized spin glassmodels, which we also use in order to study the number of critical points in theoptimization of Residual Networks. |
Year | Venue | Field |
---|---|---|
2016 | arXiv: Computer Vision and Pattern Recognition | Residual,Normalization (statistics),Computer science,Spin glass,Artificial intelligence,Deep learning,Critical point (mathematics),Scaling,Machine learning |
DocType | Volume | Citations |
Journal | abs/1611.02525 | 0 |
PageRank | References | Authors |
0.34 | 0 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Littwin, E. | 1 | 6 | 2.53 |
Lior Wolf | 2 | 5501 | 352.38 |