Explore First, Exploit Next: The True Shape of Regret in Bandit Problems - Citegraph

Paper Info

Title
Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

Abstract
We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they involve no unnecessary complications.

Year	DOI	Venue
2016	10.1287/moor.2017.0928	MATHEMATICS OF OPERATIONS RESEARCH
Keywords	Field	DocType
multiarmed bandits,cumulative regret,information-theoretic proof techniques,nonasymptotic lower bounds	Mathematical optimization,Mathematical economics,Regret,Logarithmic growth,Exploit,Mathematical proof,Mathematics	Journal
Volume	Issue	ISSN
44	2	0364-765X
Citations	PageRank	References
16	0.94	8
Authors
3

Authors (3 rows)

Cited by (16 rows)

References (8 rows)

Name	Order	Citations	PageRank
Aurélien Garivier	1	547	33.15
Pierre Ménard	2	19	6.45
Gilles Stoltz	3	351	31.53

1