A unified theory of adaptive stochastic gradient descent as Bayesian filtering. - Citegraph

Paper Info

Title
A unified theory of adaptive stochastic gradient descent as Bayesian filtering.

Abstract
We formulate stochastic gradient descent (SGD) as a Bayesian filtering problem. Inference in the Bayesian setting naturally gives rise to BRMSprop and BAdam: Bayesian variants of RMSprop and Adam. Remarkably, the Bayesian approach recovers many features of state-of-the-art adaptive SGD methods, including amoungst others root-mean-square normalization, Nesterov acceleration and AdamW. As such, the Bayesian approach provides one explanation for the empirical effectiveness of state-of-the-art adaptive SGD algorithms. Empirically comparing BRMSprop and BAdam with naive RMSprop and Adam on MNIST, we find that Bayesian methods have the potential to considerably reduce test loss and classification error.

Year	Venue	Field
2018	arXiv: Machine Learning	Kronecker delta,Gradient descent,Mathematical optimization,Stochastic gradient descent,Bayesian optimization,Hessian matrix,Gaussian,Prior probability,Artificial neural network,Mathematics
DocType	Volume	Citations
Journal	abs/1807.07540	1
PageRank	References	Authors
0.37	0	1

Authors (1 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Aitchison, Laurence	1	20	7.00

1