Abstract | ||
---|---|---|
The selection of leaders in leader-follower multi-agent systems can be naturally formulated as a matroid optimization problem. In this paper, we investigate the online and stochastic version of such a problem, where in each iteration or round, we select a set of leaders and then observe a random realization of the corresponding reward, i.e., of the system performance. This problem is referred to as a stochastic matroid bandit, a variant of combinatorial multi-armed bandit problems where the underlying combinatorial structure is a matroid. We consider semi-bandit feedback and Bernoulli rewards, and derive a tight and problem-dependent lower bound on the regret of any consistent algorithm. We propose KL-OSM, a computationally efficient algorithm that exploits the matroid structure. We derive a finite-time upper bound of the regret of KL-OSM that improves the performance guarantees of existing algorithms. This upper bound actually matches our lower bound, i.e., KL-OSM is asymptotically optimal. Numerical experiments attest that KL-OSM outperforms state-of-the-art algorithms in practice, and the difference in some cases is significant. |
Year | DOI | Venue |
---|---|---|
2016 | 10.5555/2936924.2937004 | AAMAS |
Keywords | Field | DocType |
Multi-Armed Bandits,Online Learning,Combinatorial Optimization,Matroids,Regret Analysis | Upper and lower bounds,Computer science,Matroid partitioning,Artificial intelligence,Weighted matroid,Optimization problem,Asymptotically optimal algorithm,Matroid,Mathematical optimization,Oriented matroid,Algorithm,Combinatorial optimization,Machine learning | Conference |
Citations | PageRank | References |
2 | 0.39 | 11 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Sadegh Talebi | 1 | 64 | 11.41 |
A. Proutiére | 2 | 673 | 51.18 |