Abstract | ||
---|---|---|
Most methods for decision-theoretic online learning are based on the Hedge algorithm, which takes a parameter called the learning rate. In most previous analyses the learning rate was carefully tuned to obtain optimal worst-case performance, leading to suboptimal performance on easy instances, for example when there exists an action that is significantly better than all others. We propose a new way of setting the learning rate, which adapts to the difficulty of the learning problem: in the worst case our procedure still guarantees optimal performance, but on easy instances it achieves much smaller regret. In particular, our adaptive method achieves constant regret in a probabilistic setting, when there exists an action that on average obtains strictly smaller loss than all other actions. We also provide a simulation study comparing our approach to existing methods. |
Year | Venue | DocType |
---|---|---|
2011 | NIPS | Conference |
ISSN | Citations | PageRank |
Advances in Neural Information Processing Systems 24, pages
1656-1664, December 2011 | 0 | 0.34 |
References | Authors | |
1 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Tim van Erven | 1 | 218 | 17.39 |
Peter Grunwald | 2 | 113 | 11.40 |
Wouter M. Koolen | 3 | 87 | 17.35 |
Steven de Rooij | 4 | 73 | 10.16 |