Title | ||
---|---|---|
A Near-Optimal Change-Detection Based Algorithm For Piecewise-Stationary Combinatorial Semi-Bandits |
Abstract | ||
---|---|---|
We investigate the piecewise-stationary combinatorial semi-bandit problem. Compared to the original combinatorial semi-bandit problem, our setting assumes the reward distributions of base arms may change in a piecewise-stationary manner at unknown time steps. We propose an algorithm, GLR-CUCB, which incorporates an efficient combinatorial semi-bandit algorithm. CUCB, with an almost parameter-free change-point detector, the Generalized Likelihood Ratio Test (GLRT). Our analysis shows that the regret of GLR-CUCB is upper bounded by O(root NKT log T), where N is the number of piecewise-stationary segments, K is the number of base arms, and T is the number of time steps. As a complement, we also derive a nearly matching regret lower bound on the order of Omega(root NKT). for both piecewise-stationary multi-armed bandits and combinatorial semi-bandits, using information-theoretic techniques and judiciously constructed piecewise-stationary bandit instances. Our lower bound is tighter than the best available regret lower bound, which is Omega(root T). Numerical experiments on both synthetic and real-world datasets demonstrate the superiority of GLR-CUCB compared to other state-of-the-art algorithms. |
Year | Venue | DocType |
---|---|---|
2020 | THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | Conference |
Volume | ISSN | Citations |
34 | 2159-5399 | 0 |
PageRank | References | Authors |
0.34 | 0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Huozhi Zhou | 1 | 0 | 2.03 |
Lingda Wang | 2 | 0 | 3.04 |
Lav R. Varshney | 3 | 4 | 2.83 |
Ee-Peng Lim | 4 | 5889 | 754.17 |