The Price of Bandit Information for Online Optimization - Citegraph

Paper Info

Title
The Price of Bandit Information for Online Optimization

Abstract
In the online linear optimization problem, a learner must choose, in each round, a decision from a set D Rn in order to minimize an (unknown and chang- ing) linear cost function. We present sharp rates of convergence (with respect to additive regret) for both the full information setting (where the cost function is revealed at the end of each round) and the bandit setting (where only the scalar cost incurred is revealed). In particular, this paper is concerned with the price of bandit information, by which we mean the ratio of the best achievable regret in the bandit setting to that in the full-information setting. For the full informa- tion case, the upper bound on the regret is O ( p nT ), where n is the ambient dimension andT is the time horizon. For the bandit case, we present an algorithm which achievesO (n3=2 p T ) regret — all previous (nontrivial) bounds here were O(poly(n)T2=3) or worse. It is striking that the convergence rate for the bandit setting is only a factor of n worse than in the full information case — in stark contrast to the K-arm bandit setting, where the gap in the dependence on K is exponential ( p TK vs. p T logK). We also present lower bounds showing that this gap is at least p n, which we conjecture to be the correct order. The bandit algorithm we present can be implemented efficiently in special cases of particular interest, such as path planning and Markov Decision Problems.

Year	Venue	Keywords
2007	NIPS	cost function,path planning,upper bound,rate of convergence,linear optimization,lower bound,convergence rate
Field	DocType	Citations
Convergence (routing),Mathematical optimization,Decision problem,Time horizon,Regret,Upper and lower bounds,Markov chain,Artificial intelligence,Multi-armed bandit,Rate of convergence,Mathematics,Machine learning	Conference	71
PageRank	References	Authors
7.41	10	3

Authors (3 rows)

Cited by (71 rows)

References (10 rows)

Name	Order	Citations	PageRank
Varsha Dani	1	432	40.05
Thomas P. Hayes	2	659	54.21
Sham Kakade	3	4365	282.77

1