Title
Optimal Online Learning Procedures for Model-Free Policy Evaluation
Abstract
In this study, we extend the framework of semiparametric statistical inference introduced recently to reinforcement learning [1] to online learning procedures for policy evaluation. This generalization enables us to investigate statistical properties of value function estimators both by batch and online procedures in a unified way in terms of estimating functions. Furthermore, we propose a novel online learning algorithm with optimal estimating functions which achieve the minimum estimation error. Our theoretical developments are confirmed using a simple chain walk problem.
Year
DOI
Venue
2009
10.1007/978-3-642-04174-7_31
ECML/PKDD
Keywords
Field
DocType
statistical property,optimal online learning procedures,value function estimator,theoretical development,simple chain walk problem,model-free policy evaluation,minimum estimation error,semiparametric statistical inference,policy evaluation,reinforcement learning,online procedure,novel online,optimal estimation,statistical inference,value function
Online algorithm,Online machine learning,Algorithmic learning theory,Active learning (machine learning),Computer science,Markov decision process,Unsupervised learning,Statistical inference,Artificial intelligence,Machine learning,Reinforcement learning
Conference
Volume
ISSN
Citations 
5782
0302-9743
0
PageRank 
References 
Authors
0.34
9
4
Name
Order
Citations
PageRank
Tsuyoshi Ueno1144.37
Shin-ichi Maeda2268.11
Motoaki Kawanabe31451118.86
Shin Ishii423934.39