Abstract | ||
---|---|---|
We develop a new approach, named Greedy when Sure and Conservative when Uncertain (GSCU), to competing online against unknown and nonstationary opponents. GSCU improves in four aspects: 1) introduces a novel way of learning opponent policy embeddings offline; 2) trains offline a single best response (conditional additionally on our opponent policy embedding) instead of a finite set of separate best responses against any opponent; 3) computes online a posterior of the current opponent policy embedding, without making the discrete and ineffective decision which type the current opponent belongs to; and 4) selects online between a real-time greedy policy and a fixed conservative policy via an adversarial bandit algorithm, gaining a theoretically better regret than adhering to either. Experimental studies on popular benchmarks demonstrate GSCU’s superiority over the state-of-the-art methods. The code is available online at \url{https://github.com/YeTianJHU/GSCU}. |
Year | Venue | DocType |
---|---|---|
2022 | International Conference on Machine Learning | Conference |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
11 |
Name | Order | Citations | PageRank |
---|---|---|---|
Haobo Fu | 1 | 5 | 1.11 |
Ye Tian | 2 | 0 | 0.34 |
Hongxiang Yu | 3 | 0 | 0.34 |
Weiming Liu | 4 | 0 | 1.01 |
Shuang Wu | 5 | 0 | 0.68 |
Jiechao Xiong | 6 | 0 | 0.34 |
Ying Wen | 7 | 0 | 2.37 |
Kai Li | 8 | 0 | 0.34 |
Junliang Xing | 9 | 1193 | 63.31 |
Qiang Fu | 10 | 1 | 4.42 |
Wei Yang | 11 | 0 | 0.34 |