Approximate Robust Policy Iteration for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Parametric Transition Matrices - Citegraph

Paper Info

Title
Approximate Robust Policy Iteration for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Parametric Transition Matrices

Abstract
We consider Markov decision processes with finite states, finite actions, and discounted infinite-horizon cost in the deterministic policy space. State transition matrices are uncertain but with stationary parameterization. The uncertainty in transition matrices signifies realistic considerations that an accurate system model is not available for the controller design due to limitations in estimation methods and model deficiencies. Based on the quadratic total value function formulation, two approximate robust policy iterations are developed, the performance errors of which are guaranteed to be within an arbitrarily small error bound. The two approximations make use of iterative aggregation and multilayer perceptron, respectively. It is proved that the robust policy iteration based on approximation with iterative aggregation converges surely to a stationary optimal or near-optimal policy, and also that under some conditions the robust policy iteration based on approximation with multilayer perceptron converges in a probability sense to a stationary near-optimal policy. Furthermore, under some assumptions, the stationary solutions are guaranteed to be near-optimal in the deterministic policy space.

Year	DOI	Venue
2007	10.1109/IJCNN.2007.4371274	IJCNN
Keywords	Field	DocType
near-optimal policy,stationary parameterization,multilayer perceptron,quadratic total value function formulation,matrix algebra,infinite horizon,estimation theory,multilayer perceptrons,state transition matrices,stationary parametric transition matrices,deterministic policy space,controller design,discounted infinite-horizon markov decision processes,stationary optimal policy,estimation methods,markov processes,iterative methods,iterative aggregation,approximate robust policy iteration,markov decision process,state transition,value function,system modeling	Mathematical optimization,Markov process,Computer science,Iterative method,Matrix (mathematics),Markov decision process,Bellman equation,Multilayer perceptron,Parametric statistics,Artificial intelligence,Estimation theory,Machine learning	Conference
ISSN	ISBN	Citations
1098-7576 E-ISBN : 978-1-4244-1380-5	978-1-4244-1380-5	0
PageRank	References	Authors
0.34	3	2

Authors (2 rows)

Cited by (0 rows)

References (3 rows)

Name	Order	Citations	PageRank
Baohua Li	1	34	5.57
Jennie Si	2	746	70.23

1