Variational Bayesian Reinforcement Learning with Regret Bounds. - Citegraph

Paper Info

Title
Variational Bayesian Reinforcement Learning with Regret Bounds.

Abstract
We consider the exploration-exploitation trade-off in reinforcement learning and we show that an agent imbued with a risk-seeking utility function is able to explore efficiently, as measured by regret. The parameter that controls how risk-seeking the agent is can be optimized exactly, or annealed according to a schedule. We call the resulting algorithm K-learning and show that the corresponding K-values are optimistic for the expected Q-values at each state-action pair. The K-values induce a natural Boltzmann exploration policy for which the `temperatureu0027 parameter is equal to the risk-seeking parameter. This policy achieves an expected regret bound of $tilde O(L^{3/2} sqrt{S A T})$, where $L$ is the time horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the total number of elapsed time-steps. This bound is only a factor of $L$ larger than the established lower bound. K-learning can be interpreted as mirror descent in the policy space, and it is similar to other well-known methods in the literature, including Q-learning, soft-Q-learning, and maximum entropy policy gradient, and is closely related to optimism and count based exploration methods. K-learning is simple to implement, as it only requires adding a bonus to the reward at each state-action and then solving a Bellman equation. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.

Year	Venue	Field
2018	arXiv: Learning	Discrete mathematics,Mathematical optimization,Time horizon,Regret,Upper and lower bounds,Bellman equation,Tilde,Principle of maximum entropy,Boltzmann constant,Mathematics,Reinforcement learning
DocType	Volume	Citations
Journal	abs/1807.09647	1
PageRank	References	Authors
0.35	36	1

Authors (1 rows)

Cited by (1 rows)

References (36 rows)

Name	Order	Citations	PageRank
Brendan O'Donoghue	1	172	10.19

1