Abstract | ||
---|---|---|
We present a modification of the algorithm of Dani et al. (8) for the online linear optimization prob- lem in the bandit setting, which with high proba- bility has regret at mostO ( p T ) against an adap- tive adversary. This improves on the previous al- gorithm (8) whose regret is bounded in expecta- tion against an oblivious adversary. We obtain the same dependence on the dimension (n3=2) as that exhibited by Dani et al. The results of this paper rest firmly on those of (8) and the remark- able technique of Auer et al. (2) for obtaining high- probability bounds via optimistic estimates. This paper answers an open question: it eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings. |
Year | Venue | Keywords |
---|---|---|
2008 | COLT | linear optimization |
Field | DocType | Citations |
Linear optimization problem,Mathematical optimization,Regret,Computer science,Linear programming,Adversary,Bounded function | Conference | 13 |
PageRank | References | Authors |
1.34 | 7 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Peter L. Bartlett | 1 | 5482 | 1039.97 |
Varsha Dani | 2 | 432 | 40.05 |
Thomas P. Hayes | 3 | 659 | 54.21 |
Sham Kakade | 4 | 4365 | 282.77 |
Alexander Rakhlin | 5 | 732 | 62.84 |
Ambuj Tewari | 6 | 1371 | 99.22 |