Weighted Linear Bandits for Non-Stationary Environments - Citegraph

Paper Info

Title
Weighted Linear Bandits for Non-Stationary Environments

Abstract
We consider a stochastic linear bandit model in which the available actions correspond to arbitrary context vectors whose associated rewards follow a non-stationary linear regression model. In this setting, the unknown regression parameter is allowed to vary in time. To address this problem, we propose D-LinUCB, a novel optimistic algorithm based on discounted linear regression, where exponential weights are used to smoothly forget the past. This involves studying the deviations of the sequential weighted least-squares estimator under generic assumptions. As a by-product, we obtain novel deviation results that can be used beyond non-stationary environments. We provide theoretical guarantees on the behavior of D-LinUCB in both slowly-varying and abruptly-changing environments. We obtain an upper bound on the dynamic regret that is of order d(2/3)B(T)(1/3)T(2/3), where B-T is a measure of non-stationarity (d and T being, respectively, dimension and horizon). This rate is known to be optimal. We also illustrate the empirical performance of D-LinUCB and compare it with recently proposed alternatives in simulated environments.

Year	Venue	Field
2019	ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019)	Mathematical optimization,Exponential function,Regret,Regression,Upper and lower bounds,Computer science,Horizon,Estimator,Linear regression
DocType	Volume	ISSN
Conference	32	1049-5258
Citations	PageRank	References
0	0.34	0
Authors
3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Russac, Yoan	1	0	1.35
Vernade, Claire	2	10	1.93
O. Cappe	3	2112	207.95

1