Generalized TD Learning - Citegraph

Paper Info

Title
Generalized TD Learning

Abstract
Since the invention of temporal difference (TD) learning (Sutton, 1988), many new algorithms for model-free policy evaluation have been proposed. Although they have brought much progress in practical applications of reinforcement learning (RL), there still remain fundamental problems concerning statistical properties of the value function estimation. To solve these problems, we introduce a new framework, semiparametric statistical inference, to model-free policy evaluation. This framework generalizes TD learning and its extensions, and allows us to investigate statistical properties of both of batch and online learning procedures for the value function estimation in a unified way in terms of estimating functions. Furthermore, based on this framework, we derive an optimal estimating function with the minimum asymptotic variance and propose batch and online learning algorithms which achieve the optimality.

Year	DOI	Venue
2011	10.5555/1953048.2021063	Journal of Machine Learning Research
Keywords	DocType	Volume
generalized td learning,statistical property,value function estimation,framework generalizes,new framework,semiparametric statistical inference,fundamental problem,new algorithm,policy evaluation,reinforcement learning,model-free policy evaluation	Journal	12,
ISSN	Citations	PageRank
1532-4435	2	0.42
References	Authors
24	4

Authors (4 rows)

Cited by (2 rows)

References (24 rows)

Name	Order	Citations	PageRank
Tsuyoshi Ueno	1	14	4.37
Shin-ichi Maeda	2	26	8.11
Motoaki Kawanabe	3	1451	118.86
Shin Ishii	4	239	34.39

1