Stochastic Linear Optimization under Bandit Feedback - Citegraph

Paper Info

Title
Stochastic Linear Optimization under Bandit Feedback

Abstract
In the classical stochastic k-armed bandit problem, ineachofasequenceofT rounds, adecisionmaker chooses one of k arms and incurs a cost chosen from an unknown distribution associated with that arm. The goal is to minimize regret, defined as the difference between the cost incurred by the algo- rithm and the optimal cost. In the linear optimization version of this problem (firstconsideredbyAuer(2002)), weviewthearms as vectors in Rn, and require that the costs be lin- ear functions of the chosen vector. As before, it is assumed that the cost functions are sampled in- dependently from an unknown distribution. In this setting, the goal is to find algorithms whose run- ning time and regret behave well as functions of the number of rounds T and the dimensionality n (rather than the number of arms, k, which may be exponential in n or even infinite). We give a nearly complete characterization of this problem in terms of both upper and lower bounds for the regret. In certain special cases (such as when the decision region is a polytope), the regret is polylog(T). In general though, the optimal re- gret is ( p T) — our lower bounds rule out the possibility of obtaining polylog(T) rates in gen- eral. We present two variants of an algorithm based on the idea of "upper confidence bounds." The first, due to Auer (2002), but not fully analyzed, obtains regret whose dependence on n and T are both es- sentially optimal, but which may be computation- ally intractable when the decision set is a polytope. The second version can be efficiently implemented when the decision set is a polytope (given as an in- tersection of half-spaces), but gives up a factor of p n in the regret bound. Our results also extend to the setting where the set of allowed decisions may change over time.

Year	Venue	Keywords
2008	COLT	cost function,linear optimization,lower bound,upper and lower bounds
Field	DocType	Citations
Combinatorics,Mathematical optimization,Exponential function,Regret,Computer science,Upper and lower bounds,Curse of dimensionality,Polytope,Linear programming,Linear function,Decision maker	Conference	162
PageRank	References	Authors
10.12	7	3

Search Limit

100162

Authors (3 rows)

Cited by (100 rows)

References (7 rows)

Name	Order	Citations	PageRank
Varsha Dani	1	432	40.05
Thomas P. Hayes	2	659	54.21
Sham Kakade	3	4365	282.77

1