Title
Bias-corrected Q-learning to control max-operator bias in Q-learning
Abstract
We identify a class of stochastic control problems with highly random rewards and high discount factor which induce high levels of statistical error in the estimated action-value function. This produces significant levels of max-operator bias in Q-learning, which can induce the algorithm to diverge for millions of iterations. We present a bias-corrected Q-learning algorithm with asymptotically unbiased resistance against the max-operator bias, and show that the algorithm asymptotically converges to the optimal policy, as Q-learning does. We show experimentally that bias-corrected Q-learning performs well in a domain with highly random rewards where Q-learning and other related algorithms suffer from the max-operator bias.
Year
DOI
Venue
2013
10.1109/ADPRL.2013.6614994
Adaptive Dynamic Programming And Reinforcement Learning
Keywords
Field
DocType
learning (artificial intelligence),stochastic systems,action-value function estimation,asymptotically unbiased resistance,bias-corrected Q-learning algorithm,discount factor,max-operator bias control,optimal policy,statistical error,stochastic control problems
Mathematical optimization,Discounting,Q-learning,Operator (computer programming),Mathematics,Stochastic control
Conference
ISSN
Citations 
PageRank 
2325-1824
4
0.51
References 
Authors
8
3
Name
Order
Citations
PageRank
Donghun Lee122834.37
Boris Defourny2256.26
Warren B. Powell31614151.46