Abstract | ||
---|---|---|
We study two randomized algorithms for generalized linear bandits, GLM-TSL and GLM-FPL. GLM-TSL samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. GLM-FPL, a new algorithm proposed in this work, fits a GLM to a randomly perturbed history of past rewards. We prove a $\tilde{O}(d \sqrt{n} + d^2)$ upper bound on the $n$-round regret of GLM-TSL, where $d$ is the number of features. This is the first regret bound of a Thompson sampling-like algorithm in GLM bandits where the leading term is $\tilde{O}(d \sqrt{n})$. We apply both GLM-TSL and GLM-FPL to logistic and neural network bandits, and show that they perform well empirically. In more complex models, GLM-FPL is significantly faster. Our results showcase the role of randomization, beyond posterior sampling, in exploration. |
Year | Venue | DocType |
---|---|---|
2019 | CoRR | Journal |
Volume | Citations | PageRank |
abs/1906.08947 | 0 | 0.34 |
References | Authors | |
0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Branislav Kveton | 1 | 455 | 49.32 |
Manzil Zaheer | 2 | 160 | 23.65 |
Csaba Szepesvári | 3 | 0 | 0.68 |
Lihong Li | 4 | 2390 | 128.53 |
Mohammad Ghavamzadeh | 5 | 814 | 67.73 |
Craig Boutilier | 6 | 6864 | 621.05 |