Abstract | ||
---|---|---|
Model-based Bayesian Reinforcement Learning (BRL) allows a found formalization of the problem of acting optimally while facing an unknown environment, i.e., avoiding the exploration-exploitation dilemma. However, algorithms explicitly addressing BRL suffer from such a combinatorial explosion that a large body of work relies on heuristic algorithms. This paper introduces BOLT, a simple and (almost) deterministic heuristic algorithm for BRL which is optimistic about the transition function. We analyze BOLT's sample complexity, and show that under certain parameters, the algorithm is near-optimal in the Bayesian sense with high probability. Then, experimental results highlight the key differences of this method compared to previous work. |
Year | Venue | DocType |
---|---|---|
2012 | ICML | Journal |
Volume | Citations | PageRank |
abs/1206.4613 | 21 | 1.03 |
References | Authors | |
7 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mauricio Araya-López | 1 | 50 | 3.48 |
Olivier Buffet | 2 | 258 | 26.77 |
Thomas Vincent | 3 | 320 | 27.52 |