Abstract | ||
---|---|---|
In this study, we extend the framework of semiparametric statistical inference introduced recently to reinforcement learning [1] to online learning procedures for policy evaluation. This generalization enables us to investigate statistical properties of value function estimators both by batch and online procedures in a unified way in terms of estimating functions. Furthermore, we propose a novel online learning algorithm with optimal estimating functions which achieve the minimum estimation error. Our theoretical developments are confirmed using a simple chain walk problem. |
Year | DOI | Venue |
---|---|---|
2009 | 10.1007/978-3-642-04174-7_31 | ECML/PKDD |
Keywords | Field | DocType |
statistical property,optimal online learning procedures,value function estimator,theoretical development,simple chain walk problem,model-free policy evaluation,minimum estimation error,semiparametric statistical inference,policy evaluation,reinforcement learning,online procedure,novel online,optimal estimation,statistical inference,value function | Online algorithm,Online machine learning,Algorithmic learning theory,Active learning (machine learning),Computer science,Markov decision process,Unsupervised learning,Statistical inference,Artificial intelligence,Machine learning,Reinforcement learning | Conference |
Volume | ISSN | Citations |
5782 | 0302-9743 | 0 |
PageRank | References | Authors |
0.34 | 9 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Tsuyoshi Ueno | 1 | 14 | 4.37 |
Shin-ichi Maeda | 2 | 26 | 8.11 |
Motoaki Kawanabe | 3 | 1451 | 118.86 |
Shin Ishii | 4 | 239 | 34.39 |