Abstract | ||
---|---|---|
In this work, we address risk-averse Bayes-adaptive reinforcement learning. We pose the problem of optimising the conditional value at risk (CVaR) of the total return in Bayes-adaptive Markov decision processes (MDPs). We show that a policy optimising CVaR in this setting is risk-averse to both the epistemic uncertainty due to the prior distribution over MDPs, and the aleatoric uncertainty due to the inherent stochasticity of MDPs. We reformulate the problem as a two-player stochastic game and propose an approximate algorithm based on Monte Carlo tree search and Bayesian optimisation. Our experiments demonstrate that our approach significantly outperforms baseline approaches for this problem. |
Year | Venue | DocType |
---|---|---|
2021 | Annual Conference on Neural Information Processing Systems | Conference |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Marc Rigter | 1 | 4 | 1.78 |
Bruno Lacerda | 2 | 85 | 12.96 |
Nick Hawes | 3 | 0 | 0.34 |