Title
Incentives in the Dark: Multi-armed Bandits for Evolving Users with Unknown Type.
Abstract
Design of incentives or recommendations to users is becoming more common as platform providers continually emerge. We propose a multi-armed bandit approach to the problem in which users types are unknown a priori and evolve dynamically in time. Unlike the traditional bandit setting, observed rewards are generated by a single Markov process. We demonstrate via an illustrative example that blindly applying the traditional bandit algorithms results in very poor performance as measured by regret. We introduce two variants of classical bandit algorithms, upper confidence bound (UCB) and epsilon-greedy, for which we provide theoretical bounds on the regret. We conduct a number of simulation-based experiments to show how the algorithms perform in comparison to traditional UCB and epsilon-greedy algorithms as well as reinforcement learning (Q-learning).
Year
Venue
Field
2018
arXiv: Learning
Markov process,Incentive,Regret,A priori and a posteriori,Artificial intelligence,Mathematics,Machine learning,Reinforcement learning
DocType
Volume
Citations 
Journal
abs/1803.04008
0
PageRank 
References 
Authors
0.34
14
4
Name
Order
Citations
PageRank
Lillian J. Ratliff18723.32
Shreyas Sekar2267.56
Liyuan Zheng300.68
Tanner Fiez421.49