Title
The Loss Surface of Residual Networks: Ensembles and the Role of Batch Normalization.
Abstract
Deep Residual Networks present a premium in performance in comparison to conventionalnetworks of the same depth and are trainable at extreme depths. It hasrecently been shown that Residual Networks behave like ensembles of relativelyshallow networks. We show that these ensemble are dynamic: while initiallythe virtual ensemble is mostly at depths lower than half the network’s depth, astraining progresses, it becomes deeper and deeper. The main mechanism that controlsthe dynamic ensemble behavior is the scaling introduced, e.g., by the BatchNormalization technique. We explain this behavior and demonstrate the drivingforce behind it. As a main tool in our analysis, we employ generalized spin glassmodels, which we also use in order to study the number of critical points in theoptimization of Residual Networks.
Year
Venue
Field
2016
arXiv: Computer Vision and Pattern Recognition
Residual,Normalization (statistics),Computer science,Spin glass,Artificial intelligence,Deep learning,Critical point (mathematics),Scaling,Machine learning
DocType
Volume
Citations 
Journal
abs/1611.02525
0
PageRank 
References 
Authors
0.34
0
2
Name
Order
Citations
PageRank
Littwin, E.162.53
Lior Wolf25501352.38