Simulation-based Uniform Value Function Estimates of Markov Decision Processes - Citegraph

Paper Info

Title
Simulation-based Uniform Value Function Estimates of Markov Decision Processes

Abstract
The value function of a Markov decision process (MDP) assigns to each policy its expected discounted reward. This expected reward can be estimated as the empirical average of the reward over many independent simulation runs. We derive bounds on the number of runs needed for the uniform convergence of the empirical average to the expected reward for a class of policies, in terms of the Vapnik-Chervonenkis or P-dimension of the policy class. Further, we show through a counterexample that whether we get uniform convergence or not for an MDP depends on the simulation method used. Uniform convergence results are also obtained for the average-reward case, for partially observed Markov decision processes, and can be easily extended to Markov games. The results can be viewed as a contribution to empirical process theory and as an extension of the probably approximately correct (PAC) learning theory for partially observable MDPs and Markov games.

Year	DOI	Venue
2006	10.1137/040619508	SIAM J. Control and Optimization
Keywords	Field	DocType
pac learning,markov decision process,empirical average,markov games,uniform convergence,policy class,value function estimation,expected reward,simulation-based uniform value function,independent simulation,markov decision processes,uniform rate of convergence.,markov game,uniform convergence result,empirical process theory,simulation method,value function,rate of convergence,probably approximately correct	Mathematical optimization,Markov process,Partially observable Markov decision process,Markov model,Markov chain,Markov decision process,Variable-order Markov model,Markov kernel,Mathematics,Markov renewal process	Journal
Volume	Issue	ISSN
45	5	0363-0129
Citations	PageRank	References
8	0.70	13
Authors
2

Authors (2 rows)

Cited by (8 rows)

References (13 rows)

Name	Order	Citations	PageRank
Rahul Jain	1	784	71.51
Pravin Varaiya	2	2543	298.93

1