On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport. - Citegraph

Paper Info

Title
On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport.

Abstract
Many tasks in machine learning and signal processing can be solved by minimizing a convex function of a measure. This includes sparse spikes deconvolution or training a neural network with a single hidden layer. For these problems, we study a simple minimization method: the unknown measure is discretized into a mixture of particles and a continuous-time gradient descent is performed on their weights and positions. This is an idealization of the usual way to train neural networks with a large hidden layer. We show that, when initialized correctly and in the many-particle limit, this gradient flow, although non-convex, converges to global minimizers. The proof involves Wasserstein gradient flows, a by-product of optimal transport theory. Numerical experiments show that this asymptotic behavior is already at play for a reasonable number of particles, even in high dimension.

Year	Venue	Keywords
2018	ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018)	gradient descent,machine learning,neural networks,global convergence,neural network,optimal transport,asymptotic behavior,signal processing,convex function,gradient flow,the proof
DocType	Volume	ISSN
Conference	31	1049-5258
Citations	PageRank	References
14	0.67	12
Authors
2

Authors (2 rows)

Cited by (14 rows)

References (12 rows)

Name	Order	Citations	PageRank
Chizat, Lénaïc	1	24	3.31
Francis Bach	2	11490	622.29

1