Abstract | ||
---|---|---|
We show that unconverged stochastic gradient descent can be interpreted as a procedure that samples from a nonparametric variational approximate posterior distribution. This distribution is implicitly defined as the transformation of an initial distribution by a sequence of optimization updates. By tracking the change in entropy over this sequence of transformations during optimization, we form a scalable, unbiased estimate of the variational lower bound on the log marginal likelihood. We can use this bound to optimize hyperparameters instead of using cross-validation. This Bayesian interpretation of SGD suggests improved, overfitting-resistant optimization procedures, and gives a theoretical foundation for popular tricks such as early stopping and ensembling. We investigate the properties of this marginal likelihood estimator on neural network models. |
Year | Venue | Field |
---|---|---|
2015 | CoRR | Early stopping,Stochastic gradient descent,Mathematical optimization,Hyperparameter,Upper and lower bounds,Marginal likelihood,Posterior probability,Nonparametric statistics,Artificial intelligence,Machine learning,Mathematics,Estimator |
DocType | Volume | Citations |
Journal | abs/1504.01344 | 15 |
PageRank | References | Authors |
0.69 | 12 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Dougal Maclaurin | 1 | 255 | 9.76 |
David K. Duvenaud | 2 | 629 | 32.63 |
Ryan P. Adams | 3 | 15 | 2.04 |