Title
Stochastic Linear Optimization under Bandit Feedback
Abstract
In the classical stochastic k-armed bandit problem, ineachofasequenceofT rounds, adecisionmaker chooses one of k arms and incurs a cost chosen from an unknown distribution associated with that arm. The goal is to minimize regret, defined as the difference between the cost incurred by the algo- rithm and the optimal cost. In the linear optimization version of this problem (firstconsideredbyAuer(2002)), weviewthearms as vectors in Rn, and require that the costs be lin- ear functions of the chosen vector. As before, it is assumed that the cost functions are sampled in- dependently from an unknown distribution. In this setting, the goal is to find algorithms whose run- ning time and regret behave well as functions of the number of rounds T and the dimensionality n (rather than the number of arms, k, which may be exponential in n or even infinite). We give a nearly complete characterization of this problem in terms of both upper and lower bounds for the regret. In certain special cases (such as when the decision region is a polytope), the regret is polylog(T). In general though, the optimal re- gret is ( p T) — our lower bounds rule out the possibility of obtaining polylog(T) rates in gen- eral. We present two variants of an algorithm based on the idea of "upper confidence bounds." The first, due to Auer (2002), but not fully analyzed, obtains regret whose dependence on n and T are both es- sentially optimal, but which may be computation- ally intractable when the decision set is a polytope. The second version can be efficiently implemented when the decision set is a polytope (given as an in- tersection of half-spaces), but gives up a factor of p n in the regret bound. Our results also extend to the setting where the set of allowed decisions may change over time.
Year
Venue
Keywords
2008
COLT
cost function,linear optimization,lower bound,upper and lower bounds
Field
DocType
Citations 
Combinatorics,Mathematical optimization,Exponential function,Regret,Computer science,Upper and lower bounds,Curse of dimensionality,Polytope,Linear programming,Linear function,Decision maker
Conference
162
PageRank 
References 
Authors
10.12
7
3
Search Limit
100162
Name
Order
Citations
PageRank
Varsha Dani143240.05
Thomas P. Hayes265954.21
Sham Kakade34365282.77