Title
Explore First, Exploit Next: The True Shape of Regret in Bandit Problems
Abstract
We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they involve no unnecessary complications.
Year
DOI
Venue
2016
10.1287/moor.2017.0928
MATHEMATICS OF OPERATIONS RESEARCH
Keywords
Field
DocType
multiarmed bandits,cumulative regret,information-theoretic proof techniques,nonasymptotic lower bounds
Mathematical optimization,Mathematical economics,Regret,Logarithmic growth,Exploit,Mathematical proof,Mathematics
Journal
Volume
Issue
ISSN
44
2
0364-765X
Citations 
PageRank 
References 
16
0.94
8
Authors
3
Name
Order
Citations
PageRank
Aurélien Garivier154733.15
Pierre Ménard2196.45
Gilles Stoltz335131.53