Abstract | ||
---|---|---|
We study a variant of the stochastic multi-armed bandit problem where the set of available arms varies arbitrarily with time (also known as the sleeping bandit problem). We focus on the Thompson Sampling algorithm and consider a regret notion defined with respect to the best available arm. Our main result is an O(log T) regret bound for Thompson Sampling, which generalizes a similar bound known for this algorithm from the classical bandit setting. Our bound also matches (up to constants) the best-known lower bound for the sleeping bandit problem. We show via simulations that Thompson Sampling outperforms the UCB-style AUER algorithm for the sleeping bandit problem. |
Year | Venue | Field |
---|---|---|
2017 | CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI2017) | Computer science,Thompson sampling,Artificial intelligence,Machine learning |
DocType | Citations | PageRank |
Conference | 1 | 0.36 |
References | Authors | |
8 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Aritra Chatterjee | 1 | 1 | 0.36 |
Ganesh Ghalme | 2 | 1 | 1.71 |
Shweta Jain | 3 | 15 | 2.36 |
Vaish, Rohit | 4 | 34 | 3.23 |
Y. Narahari | 5 | 699 | 98.97 |