Title
Generalized TD Learning
Abstract
Since the invention of temporal difference (TD) learning (Sutton, 1988), many new algorithms for model-free policy evaluation have been proposed. Although they have brought much progress in practical applications of reinforcement learning (RL), there still remain fundamental problems concerning statistical properties of the value function estimation. To solve these problems, we introduce a new framework, semiparametric statistical inference, to model-free policy evaluation. This framework generalizes TD learning and its extensions, and allows us to investigate statistical properties of both of batch and online learning procedures for the value function estimation in a unified way in terms of estimating functions. Furthermore, based on this framework, we derive an optimal estimating function with the minimum asymptotic variance and propose batch and online learning algorithms which achieve the optimality.
Year
DOI
Venue
2011
10.5555/1953048.2021063
Journal of Machine Learning Research
Keywords
DocType
Volume
generalized td learning,statistical property,value function estimation,framework generalizes,new framework,semiparametric statistical inference,fundamental problem,new algorithm,policy evaluation,reinforcement learning,model-free policy evaluation
Journal
12,
ISSN
Citations 
PageRank 
1532-4435
2
0.42
References 
Authors
24
4
Name
Order
Citations
PageRank
Tsuyoshi Ueno1144.37
Shin-ichi Maeda2268.11
Motoaki Kawanabe31451118.86
Shin Ishii423934.39