On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems. - Citegraph

Paper Info

Title
On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems.

Abstract
We study the non-stationary stochastic multi-armed bandit (MAB) problem and propose two generic algorithms, namely, Limited Memory Deterministic Sequencing of Exploration and Exploitation (LM-DSEE) and Sliding-Window Upper Confidence Bound# (SW-UCB#). We rigorously analyze these algorithms in abruptly-changing and slowly-varying environments and characterize their performance. We show that the expected cumulative regret for these algorithms in either of the environments is upper bounded by sublinear functions of time, i.e., the time average of the regret asymptotically converges to zero. We complement our analysis with numerical illustrations.

Year	Venue	DocType
2018	ACC	Conference
Volume	Citations	PageRank
abs/1802.08380	0	0.34
References	Authors
17	2

Authors (2 rows)

Cited by (0 rows)

References (17 rows)

Name	Order	Citations	PageRank
Lai Wei	1	8	4.98
Vaibhav Srivastava	2	137	20.18

1