Risk Aversion Operator For Addressing Maximization Bias In Q-Learning - Citegraph

Paper Info

Title
Risk Aversion Operator For Addressing Maximization Bias In Q-Learning

Abstract
In Q-learning, the reduced chance of converging to the optimal policy is partly caused by the estimated bias of action values. The estimation of action values usually leads to biases like the overestimation and underestimation thus it hurts the current policy. The values produced by the maximization operator are overestimated, which is well known as the maximization bias. For correcting the bias, the values are reduced towards the underestimation by the double estimators operator. However, according to the proposed analysis, the performances of the two operators (the maximization operator and the double estimators operator) rely on the undetermined dynamic of environment in which the estimated bias results from not only the difference between the current policy and optimal policy, but also the sampling error of reward. The sampling error which is increased by the operators leads to the risk of converging to the non-optimal policy. In order to reduce the risk, this paper proposes a flexible operator which takes account of the most visited action value instead of the greedy value, named Risk Aversion operator which is inspired by the humans & x2019; response to the uncertainty. Based on this operator, the Risk Aversion Q-learning is proposed; the boundary of action values and the convergence are proven. In three demonstration tasks whose optimal policy is known, the proposed algorithm increases the chance of converging to the optimal policy.

Year	DOI	Venue
2020	10.1109/ACCESS.2020.2977400	IEEE ACCESS
Keywords	DocType	Volume
Uncertainty, Licenses, Convergence, Task analysis, Heuristic algorithms, Sociology, Statistics, Reinforcement learning, maximization bias, value iteration, Q-learning, risk aversion	Journal	8
ISSN	Citations	PageRank
2169-3536	0	0.34
References	Authors
0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Bi Wang	1	0	0.34
Xuelian Li	2	0	0.34
Zhiqiang Gao	3	349	39.84
Yangjun Zhong	4	0	0.34

1