Pure exploration in multi-armed bandits problems - Citegraph

Paper Info

Title
Pure exploration in multi-armed bandits problems

Abstract
We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of strategies that perform an online exploration of the arms. The strategies are assessed in terms of their simple regret, a regret notion that captures the fact that exploration is only constrained by the number of available rounds (not necessarily known in advance), in contrast to the case when the cumulative regret is considered and when exploitation needs to be performed at the same time.We believe that this performance criterion is suited to situations when the cost of pulling an arm is expressed in terms of resources rather than rewards. We discuss the links between the simple and the cumulative regret. The main result is that the required exploration-exploitation trade-offs are qualitatively different, in view of a general lower bound on the simple regret in terms of the cumulative regret.

Year	Venue	Keywords
2008	ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory	cumulative regret,simple regret,regret notion,online exploration,available round,exploitation need,main result,performance criterion,required exploration-exploitation trade-offs,stochastic multi-armed bandit problem,multi-armed bandits problem,pure exploration
DocType	Volume	ISSN
Journal	abs/0802.2	0302-9743
ISBN	Citations	PageRank
3-642-04413-1	6	0.95
References	Authors
9	3

Authors (3 rows)

Cited by (6 rows)

References (9 rows)

Name	Order	Citations	PageRank
Sébastien Bubeck	1	1472	92.28
Rémi Munos	2	2240	157.06
Gilles Stoltz	3	351	31.53

1