Abstract | ||
---|---|---|
In this presentation, we address generic multi-armed bandit problems with stochastic rewards and known structure. Our notion of structure is generic and includes well-studied bandit structures such as linear, combinatorial, unimodal, Lipschitz, dueling etc. We propose a generic algorithm and prove its asymptotic optimality when the time horizon goes to infinity. We further propose a finite time regret analysis of our algorithm. As a byproduct of our analysis we develop several novel technical results which are useful to analyze generic bandit problems. More details can be found in the technical report https://arxiv.org/abs/1711.00400. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/ALLERTON.2018.8635908 | Allerton |
Keywords | Field | DocType |
Writing,Communications technology | Mathematical optimization,Time horizon,Regret,Computer science,Infinity,Algorithm,Lipschitz continuity,Asymptotically optimal algorithm,Technical report,Genetic algorithm,Finite time | Conference |
ISSN | ISBN | Citations |
2474-0195 | 978-1-5386-6596-1 | 0 |
PageRank | References | Authors |
0.34 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Richard Combes | 1 | 170 | 20.11 |
Stefan Magureanu | 2 | 29 | 2.82 |
Alexandre Proutiere | 3 | 558 | 40.94 |