Incentives in the Dark: Multi-armed Bandits for Evolving Users with Unknown Type. - Citegraph

Paper Info

Title
Incentives in the Dark: Multi-armed Bandits for Evolving Users with Unknown Type.

Abstract
Design of incentives or recommendations to users is becoming more common as platform providers continually emerge. We propose a multi-armed bandit approach to the problem in which users types are unknown a priori and evolve dynamically in time. Unlike the traditional bandit setting, observed rewards are generated by a single Markov process. We demonstrate via an illustrative example that blindly applying the traditional bandit algorithms results in very poor performance as measured by regret. We introduce two variants of classical bandit algorithms, upper confidence bound (UCB) and epsilon-greedy, for which we provide theoretical bounds on the regret. We conduct a number of simulation-based experiments to show how the algorithms perform in comparison to traditional UCB and epsilon-greedy algorithms as well as reinforcement learning (Q-learning).

Year	Venue	Field
2018	arXiv: Learning	Markov process,Incentive,Regret,A priori and a posteriori,Artificial intelligence,Mathematics,Machine learning,Reinforcement learning
DocType	Volume	Citations
Journal	abs/1803.04008	0
PageRank	References	Authors
0.34	14	4

Authors (4 rows)

Cited by (0 rows)

References (14 rows)

Name	Order	Citations	PageRank
Lillian J. Ratliff	1	87	23.32
Shreyas Sekar	2	26	7.56
Liyuan Zheng	3	0	0.68
Tanner Fiez	4	2	1.49

1