Title
Exploiting Correlation in Finite-Armed Structured Bandits.
Abstract
We consider a structured Multi-Armed bandit problem in which mean rewards of different arms are related through a hidden parameter. We propose an approach that allows generalization of classical bandit algorithms such as UCB and Thompson sampling to the structured bandit setting. Our approach is based on exploiting the structure in the problem to identify some arms as sub-optimal and pulling them only O(1) times. This results in significant reduction in cumulative regret and in fact our algorithm achieves bounded (i.e., O(1)) regret whenever possible. We empirically demonstrate the superiority of our algorithms via simulations and experiments on the Movielens dataset. Moreover, the problem setting we study in this paper subsumes several previously studied framework such as Global, Regional and Structured bandits with linear rewards.
Year
Venue
Field
2018
arXiv: Machine Learning
Mathematical optimization,Regret,MovieLens,Thompson sampling,Correlation,Mathematics,Bounded function
DocType
Volume
Citations 
Journal
abs/1810.08164
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Samarth Gupta1135.60
Gauri Joshi230829.70
Osman Yagan343043.65