Title
Reliability of internal prediction/estimation and its application. I. Adaptive action selection reflecting reliability of value function.
Abstract
This article proposes an adaptive action-selection method for a model-free reinforcement learning system, based on the concept of the 'reliability of internal prediction/estimation'. This concept is realized using an internal variable, called the Reliability Index (RI), which estimates the accuracy of the internal estimator. We define this index for a value function of a temporal difference learning system and substitute it for the temperature parameter of the Boltzmann action-selection rule. Accordingly, the weight of exploratory actions adaptively changes depending on the uncertainty of the prediction. We use this idea for tabular and weighted-sum type value functions. Moreover, we use the RI to adjust the learning coefficient in addition to the temperature parameter, meaning that the reliability becomes a general basis for meta-learning. Numerical experiments were performed to examine the behavior of the proposed method. The RI-based Q-learning system demonstrated its features when the adaptive learning coefficient and large RI-discount rate (which indicate how the RI values of future states are reflected in the RI value of the current state) were introduced. Statistical tests confirmed that the algorithm spent more time exploring in the initial phase of learning, but accelerated learning from the midpoint of learning. It is also shown that the proposed method does not work well with the actor-critic models. The limitations of the proposed method and its relationship to relevant research are discussed.
Year
DOI
Venue
2004
10.1016/j.neunet.2004.05.004
Neural Networks
Keywords
Field
DocType
internal estimator,td learning,discount rate,adaptive action selection,adaptive action-selection method,internal prediction,value function,weighted-sum type value function,temperature parameter,exploration–exploitation balance,reliability,exploration- exploitation balance,meta-learning,model-free reinforcement learning,internal variable,ri value,ri-based q-learning system,statistical test,indexation,adaptive learning,action selection,temporal difference learning,reinforcement learning
Temporal difference learning,Midpoint,Bellman equation,Artificial intelligence,Action selection,Adaptive learning,Mathematics,Machine learning,Statistical hypothesis testing,Estimator,Reinforcement learning
Journal
Volume
Issue
ISSN
17
7
0893-6080
Citations 
PageRank 
References 
4
0.76
12
Authors
2
Name
Order
Citations
PageRank
Yutaka Sakaguchi1267.81
M. Takano251.55