Bias-corrected Q-learning to control max-operator bias in Q-learning - Citegraph

Paper Info

Title
Bias-corrected Q-learning to control max-operator bias in Q-learning

Abstract
We identify a class of stochastic control problems with highly random rewards and high discount factor which induce high levels of statistical error in the estimated action-value function. This produces significant levels of max-operator bias in Q-learning, which can induce the algorithm to diverge for millions of iterations. We present a bias-corrected Q-learning algorithm with asymptotically unbiased resistance against the max-operator bias, and show that the algorithm asymptotically converges to the optimal policy, as Q-learning does. We show experimentally that bias-corrected Q-learning performs well in a domain with highly random rewards where Q-learning and other related algorithms suffer from the max-operator bias.

Year	DOI	Venue
2013	10.1109/ADPRL.2013.6614994	Adaptive Dynamic Programming And Reinforcement Learning
Keywords	Field	DocType
learning (artificial intelligence),stochastic systems,action-value function estimation,asymptotically unbiased resistance,bias-corrected Q-learning algorithm,discount factor,max-operator bias control,optimal policy,statistical error,stochastic control problems	Mathematical optimization,Discounting,Q-learning,Operator (computer programming),Mathematics,Stochastic control	Conference
ISSN	Citations	PageRank
2325-1824	4	0.51
References	Authors
8	3

Authors (3 rows)

Cited by (4 rows)

References (8 rows)

Name	Order	Citations	PageRank
Donghun Lee	1	228	34.37
Boris Defourny	2	25	6.26
Warren B. Powell	3	1614	151.46

1