Is Deeper Better only when Shallow is Good? - Citegraph

Paper Info

Title
Is Deeper Better only when Shallow is Good?

Abstract
Understanding the power of depth in feed-forward neural networks is an ongoing challenge in the field of deep learning theory. While current works account for the importance of depth for the expressive power of neural-networks, it remains an open question whether these benefits are exploited during a gradient-based optimization process. In this work we explore the relation between expressivity properties of deep networks and the ability to train them efficiently using gradient-based algorithms. We give a depth separation argument for distributions with fractal structure, showing that they can be expressed efficiently by deep networks, but not with shallow ones. These distributions have a natural coarse-to-fine structure, and we show that the balance between the coarse and fine details has a crucial effect on whether the optimization process is likely to succeed. We prove that when the distribution is concentrated on the fine details, gradient-based algorithms are likely to fail. Using this result we prove that, at least in some distributions, the success of learning deep networks depends on whether the distribution can be approximated by shallower networks, and we conjecture that this property holds in general.

Year	Venue	Keywords
2019	ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019)	feedforward neural network
Field	DocType	Volume
Mathematical optimization,Fractal,Theoretical computer science,Artificial intelligence,Deep learning,Artificial neural network,Expressive power,Conjecture,Mathematics,Expressivity	Journal	32
ISSN	Citations	PageRank
1049-5258	1	0.36
References	Authors
13	2

Authors (2 rows)

Cited by (1 rows)

References (13 rows)

Name	Order	Citations	PageRank
Malach, Eran	1	52	5.60
Shai Shalev-Shwartz	2	3681	276.32

1