A Theory of Regularized Markov Decision Processes. - Citegraph

Paper Info

Title
A Theory of Regularized Markov Decision Processes.

Abstract
Many recent successful (deep) reinforcement learning algorithms make use of regularization, generally based on entropy or on Kullback-Leibler divergence. We propose a general theory of regularized Markov Decision Processes that generalizes these approaches in two directions: we consider a larger class of regularizers, and we consider the general modified policy iteration approach, encompassing both policy iteration and value iteration. The core building blocks of this theory are a notion of regularized Bellman operator and the Legendre-Fenchel transform, a classical tool of convex optimization. This approach allows for error propagation analyses of general algorithmic schemes of which (possibly variants of) classical algorithms such as Trust Region Policy Optimization, Soft Q-learning, Stochastic Actor Critic or Dynamic Policy Programming are special cases. This also draws connections to proximal convex optimization, especially to Mirror Descent.

Year	Venue	Field
2019	arXiv: Learning	Trust region,Mathematical optimization,Divergence,Propagation of uncertainty,Computer science,Markov decision process,Regularization (mathematics),Operator (computer programming),Artificial intelligence,Convex optimization,Machine learning,Reinforcement learning
DocType	Volume	Citations
Journal	abs/1901.11275	0
PageRank	References	Authors
0.34	25	3

Authors (3 rows)

Cited by (0 rows)

References (25 rows)

Name	Order	Citations	PageRank
Matthieu Geist	1	385	44.31
Bruno Scherrer	2	126	14.58
Olivier Pietquin	3	664	68.60

1