Title
An Approximately Optimal Relative Value Learning Algorithm for Averaged MDPs with Continuous States and Actions
Abstract
It has long been a challenging problem to design algorithms for Markov decision processes (MDPs) with continuous states and actions that are provably approximately optimal and can provide arbitrarily good approximation for any MDP. In this paper, we propose an empirical value learning algorithm for average MDPs with continuous states and actions that combines empirical value iteration with n function-parametric approximation and approximation of transition probability distribution with kernel density estimation. We view each iteration as operation of random operator and argue convergence using the probabilistic contraction analysis method that the authors (along with others) have recently developed.
Year
DOI
Venue
2019
10.1109/ALLERTON.2019.8919719
2019 57TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON)
Field
DocType
ISSN
Convergence (routing),Approximation algorithm,Mathematical optimization,Markov process,Function approximation,Computer science,Algorithm,Markov decision process,Probability distribution,Probabilistic logic,Kernel density estimation
Conference
2474-0195
Citations 
PageRank 
References 
0
0.34
0
Authors
2
Name
Order
Citations
PageRank
Hiteshi Sharma102.37
Rahul Jain278471.51