Title
The Uncertainty Bellman Equation and Exploration.
Abstract
We consider the exploration/exploitation problem in reinforcement learning. For exploitation, it is well known that the Bellman equation connects the value at any time-step to the expected value at subsequent time-steps. In this paper we consider a similar uncertainty Bellman equation (UBE), which connects the uncertainty at any time-step to the expected uncertainties at subsequent time-steps, thereby extending the potential exploratory benefit of a policy beyond individual time-steps. We prove that the unique fixed point of the UBE yields an upper bound on the variance of the estimated value of any fixed policy. This bound can be much tighter than traditional count-based bonuses that compound standard deviation rather than variance. Importantly, and unlike several existing approaches to optimism, this method scales naturally to large systems with complex generalization. Substituting our UBE-exploration strategy for $epsilon$-greedy improves DQN performance on 51 out of 57 games in the Atari suite.
Year
Venue
DocType
2018
international conference on machine learning
Conference
Volume
Citations 
PageRank 
abs/1709.05380
12
0.53
References 
Authors
25
4
Name
Order
Citations
PageRank
Brendan O'Donoghue117210.19
Osband, Ian228415.35
Rémi Munos32240157.06
Volodymyr Mnih43796158.28