Abstract | ||
---|---|---|
In the classical stochastic k-armed bandit problem, ineachofasequenceofT rounds, adecisionmaker chooses one of k arms and incurs a cost chosen from an unknown distribution associated with that arm. The goal is to minimize regret, defined as the difference between the cost incurred by the algo- rithm and the optimal cost. In the linear optimization version of this problem (firstconsideredbyAuer(2002)), weviewthearms as vectors in Rn, and require that the costs be lin- ear functions of the chosen vector. As before, it is assumed that the cost functions are sampled in- dependently from an unknown distribution. In this setting, the goal is to find algorithms whose run- ning time and regret behave well as functions of the number of rounds T and the dimensionality n (rather than the number of arms, k, which may be exponential in n or even infinite). We give a nearly complete characterization of this problem in terms of both upper and lower bounds for the regret. In certain special cases (such as when the decision region is a polytope), the regret is polylog(T). In general though, the optimal re- gret is ( p T) — our lower bounds rule out the possibility of obtaining polylog(T) rates in gen- eral. We present two variants of an algorithm based on the idea of "upper confidence bounds." The first, due to Auer (2002), but not fully analyzed, obtains regret whose dependence on n and T are both es- sentially optimal, but which may be computation- ally intractable when the decision set is a polytope. The second version can be efficiently implemented when the decision set is a polytope (given as an in- tersection of half-spaces), but gives up a factor of p n in the regret bound. Our results also extend to the setting where the set of allowed decisions may change over time. |
Year | Venue | Keywords |
---|---|---|
2008 | COLT | cost function,linear optimization,lower bound,upper and lower bounds |
Field | DocType | Citations |
Combinatorics,Mathematical optimization,Exponential function,Regret,Computer science,Upper and lower bounds,Curse of dimensionality,Polytope,Linear programming,Linear function,Decision maker | Conference | 162 |
PageRank | References | Authors |
10.12 | 7 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Varsha Dani | 1 | 432 | 40.05 |
Thomas P. Hayes | 2 | 659 | 54.21 |
Sham Kakade | 3 | 4365 | 282.77 |