Title
Residual Networks are Exponential Ensembles of Relatively Shallow Networks.
Abstract
In this work, we introduce a novel interpretation of residual networks showing they are exponential ensembles. This observation is supported by a large-scale lesion study that demonstrates they behave just like ensembles at test time. Subsequently, we perform an analysis showing these ensembles mostly consist of networks that are each relatively shallow. For example, contrary to our expectations, most of the gradient in a residual network with 110 layers comes from an ensemble of very short networks, i.e., only 10-34 layers deep. This suggests that in addition to describing neural networks in terms of width and depth, there is a third dimension: multiplicity, the size of the implicit ensemble. Ultimately, residual networks do not resolve the vanishing gradient problem by preserving gradient flow throughout the entire depth of the network - rather, they avoid the problem simply by ensembling many short networks together. This insight reveals that depth is still an open research question and invites the exploration of the related notion of multiplicity.
Year
Venue
Field
2016
arXiv: Computer Vision and Pattern Recognition
Residual,Exponential function,Computer science,Multiplicity (mathematics),Artificial intelligence,Artificial neural network,Balanced flow,Vanishing gradient problem,Machine learning
DocType
Volume
Citations 
Journal
abs/1605.06431
13
PageRank 
References 
Authors
0.75
3
3
Name
Order
Citations
PageRank
Andreas Veit1504.85
Michael J. Wilber2867.37
Serge J. Belongie3125121010.13