Title
Multi-Armed Bandits with Non-Stationary Rewards.
Abstract
The multi-armed bandit problem where the rewards are realizations of general non-stationary stochastic processes is a challenging setting which has not been previously tackled in the bandit literature in its full generality. We present the first theoretical analysis of this problem by deriving guarantees for both the path-dependent dynamic pseudo-regret and the standard pseudo-regret that, remarkably, are both logarithmic in the number of rounds under certain natural conditions. We describe several UCB-type algorithms based on the notion of weighted discrepancy, a key measure of non-stationarity for stochastic processes. We show that discrepancy provides a unified framework for the analysis of non-stationary rewards. Our experiments demonstrate a significant improvement in practice compared to standard benchmarks.
Year
Venue
Field
2017
arXiv: Learning
Mathematical optimization,Computer science,Stochastic process,Logarithm,Generality
DocType
Volume
Citations 
Journal
abs/1710.10657
0
PageRank 
References 
Authors
0.34
10
5
Name
Order
Citations
PageRank
Corinna Cortes165741120.50
Giulia DeSalvo2736.45
Vitaly Kuznetsov3689.33
Mehryar Mohri44502448.21
Yang, Scott5336.24