Title
On the convergence of reinforcement learning with Monte Carlo Exploring Starts
Abstract
A basic simulation-based reinforcement learning algorithm is the Monte Carlo Exploring Starts (MCES) method, also known as optimistic policy iteration, in which the value function is approximated by simulated returns and a greedy policy is selected at each iteration. The convergence of this algorithm in the general setting has been an open question. In this paper, we investigate the convergence of this algorithm for the case with undiscounted costs, also known as the stochastic shortest path problem. The results complement existing partial results on this topic and thereby help further settle the open problem.
Year
DOI
Venue
2021
10.1016/j.automatica.2021.109693
Automatica
Keywords
DocType
Volume
Reinforcement learning,Markov decision processes,Stochastic control,Monte Carlo Exploring Starts,Optimistic policy iteration,Convergence,Stochastic shortest path problem
Journal
129
Issue
ISSN
Citations 
1
0005-1098
0
PageRank 
References 
Authors
0.34
0
1
Name
Order
Citations
PageRank
Jun Liu121520.63