Multi-Armed Bandits with Non-Stationary Rewards. - Citegraph

Paper Info

Title
Multi-Armed Bandits with Non-Stationary Rewards.

Abstract
The multi-armed bandit problem where the rewards are realizations of general non-stationary stochastic processes is a challenging setting which has not been previously tackled in the bandit literature in its full generality. We present the first theoretical analysis of this problem by deriving guarantees for both the path-dependent dynamic pseudo-regret and the standard pseudo-regret that, remarkably, are both logarithmic in the number of rounds under certain natural conditions. We describe several UCB-type algorithms based on the notion of weighted discrepancy, a key measure of non-stationarity for stochastic processes. We show that discrepancy provides a unified framework for the analysis of non-stationary rewards. Our experiments demonstrate a significant improvement in practice compared to standard benchmarks.

Year	Venue	Field
2017	arXiv: Learning	Mathematical optimization,Computer science,Stochastic process,Logarithm,Generality
DocType	Volume	Citations
Journal	abs/1710.10657	0
PageRank	References	Authors
0.34	10	5

Authors (5 rows)

Cited by (0 rows)

References (10 rows)

Name	Order	Citations	PageRank
Corinna Cortes	1	6574	1120.50
Giulia DeSalvo	2	73	6.45
Vitaly Kuznetsov	3	68	9.33
Mehryar Mohri	4	4502	448.21
Yang, Scott	5	33	6.24

1