Optimal Online Learning Procedures for Model-Free Policy Evaluation - Citegraph

Paper Info

Title
Optimal Online Learning Procedures for Model-Free Policy Evaluation

Abstract
In this study, we extend the framework of semiparametric statistical inference introduced recently to reinforcement learning [1] to online learning procedures for policy evaluation. This generalization enables us to investigate statistical properties of value function estimators both by batch and online procedures in a unified way in terms of estimating functions. Furthermore, we propose a novel online learning algorithm with optimal estimating functions which achieve the minimum estimation error. Our theoretical developments are confirmed using a simple chain walk problem.

Year	DOI	Venue
2009	10.1007/978-3-642-04174-7_31	ECML/PKDD
Keywords	Field	DocType
statistical property,optimal online learning procedures,value function estimator,theoretical development,simple chain walk problem,model-free policy evaluation,minimum estimation error,semiparametric statistical inference,policy evaluation,reinforcement learning,online procedure,novel online,optimal estimation,statistical inference,value function	Online algorithm,Online machine learning,Algorithmic learning theory,Active learning (machine learning),Computer science,Markov decision process,Unsupervised learning,Statistical inference,Artificial intelligence,Machine learning,Reinforcement learning	Conference
Volume	ISSN	Citations
5782	0302-9743	0
PageRank	References	Authors
0.34	9	4

Authors (4 rows)

Cited by (0 rows)

References (9 rows)

Name	Order	Citations	PageRank
Tsuyoshi Ueno	1	14	4.37
Shin-ichi Maeda	2	26	8.11
Motoaki Kawanabe	3	1451	118.86
Shin Ishii	4	239	34.39

1