Randomized Exploration in Generalized Linear Bandits. - Citegraph

Paper Info

Title
Randomized Exploration in Generalized Linear Bandits.

Abstract
We study two randomized algorithms for generalized linear bandits, GLM-TSL and GLM-FPL. GLM-TSL samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. GLM-FPL, a new algorithm proposed in this work, fits a GLM to a randomly perturbed history of past rewards. We prove a $\tilde{O}(d \sqrt{n} + d^2)$ upper bound on the $n$-round regret of GLM-TSL, where $d$ is the number of features. This is the first regret bound of a Thompson sampling-like algorithm in GLM bandits where the leading term is $\tilde{O}(d \sqrt{n})$. We apply both GLM-TSL and GLM-FPL to logistic and neural network bandits, and show that they perform well empirically. In more complex models, GLM-FPL is significantly faster. Our results showcase the role of randomization, beyond posterior sampling, in exploration.

Year	Venue	DocType
2019	CoRR	Journal
Volume	Citations	PageRank
abs/1906.08947	0	0.34
References	Authors
0	6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Branislav Kveton	1	455	49.32
Manzil Zaheer	2	160	23.65
Csaba Szepesvári	3	0	0.68
Lihong Li	4	2390	128.53
Mohammad Ghavamzadeh	5	814	67.73
Craig Boutilier	6	6864	621.05

1