Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function - Citegraph

Paper Info

Title
Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function

Abstract
This article suggests that deterministic Gradient Descent, which does not use any stochastic gradient approximation, can still exhibit stochastic behaviors. In particular, it shows that if the objective function exhibit multiscale behaviors, then in a large learning rate regime which only resolves the macroscopic but not the microscopic details of the objective, the deterministic GD dynamics can become chaotic and convergent not to a local minimizer but to a statistical distribution. A sufficient condition is also established for approximating this long-time statistical limit by a rescaled Gibbs distribution. Both theoretical and numerical demonstrations are provided, and the theoretical part relies on the construction of a stochastic map that uses bounded noise (as opposed to discretized diffusions).

Year	Venue	DocType
2020	NIPS 2020	Conference
Volume	Citations	PageRank
33	0	0.34
References	Authors
0	2

Authors (2 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Lingkai Kong	1	231	27.91
Molei Tao	2	16	5.64

1