Abstract | ||
---|---|---|
We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they involve no unnecessary complications. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1287/moor.2017.0928 | MATHEMATICS OF OPERATIONS RESEARCH |
Keywords | Field | DocType |
multiarmed bandits,cumulative regret,information-theoretic proof techniques,nonasymptotic lower bounds | Mathematical optimization,Mathematical economics,Regret,Logarithmic growth,Exploit,Mathematical proof,Mathematics | Journal |
Volume | Issue | ISSN |
44 | 2 | 0364-765X |
Citations | PageRank | References |
16 | 0.94 | 8 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Aurélien Garivier | 1 | 547 | 33.15 |
Pierre Ménard | 2 | 19 | 6.45 |
Gilles Stoltz | 3 | 351 | 31.53 |