Abstract | ||
---|---|---|
This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO). PSRO is general in the sense that it (1) encompasses well-known algorithms such as fictitious play and double oracle as special cases, and (2) in principle applies to general-sum, many-player games. Despite this, prior studies of PSRO have been focused on two-player zero-sum games, a regime where in Nash equilibria are tractably computable. In moving from two-player zero-sum games to more general settings, computation of Nash equilibria quickly becomes infeasible. Here, we extend the theoretical underpinnings of PSRO by considering an alternative solution concept, α-Rank, which is unique (thus faces no equilibrium selection issues, unlike Nash) and applies readily to general-sum, many-player settings. We establish convergence guarantees in several games classes, and identify links between Nash equilibria and α-Rank. We demonstrate the competitive performance of α-Rank-based PSRO against an exact Nash solver-based PSRO in 2-player Kuhn and Leduc Poker. We then go beyond the reach of prior PSRO applications by considering 3- to 5-player poker games, yielding instances where α-Rank achieves faster convergence than approximate Nash solvers, thus establishing it as a favorable general games solver. We also carry out an initial empirical validation in MuJoCo soccer, illustrating the feasibility of the proposed approach in another complex domain. |
Year | Venue | Keywords |
---|---|---|
2020 | ICLR | multiagent learning, game theory, training, games |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
28 | 15 |
Name | Order | Citations | PageRank |
---|---|---|---|
Paul Muller | 1 | 1 | 1.70 |
Shayegan Omidshafiei | 2 | 60 | 10.34 |
Rowland, Mark | 3 | 49 | 7.39 |
Karl Tuyls | 4 | 1272 | 127.83 |
Julien Perolat | 5 | 75 | 12.64 |
Siqi Liu | 6 | 55 | 4.94 |
Daniel Hennes | 7 | 135 | 18.46 |
Luke Marris | 8 | 28 | 2.08 |
Marc Lanctot | 9 | 2121 | 97.97 |
Edward Hughes | 10 | 26 | 7.67 |
Zhe Wang | 11 | 5 | 1.44 |
Guy Lever | 12 | 108 | 7.07 |
Nicolas Heess | 13 | 1762 | 94.77 |
Graepel, Thore | 14 | 5 | 4.10 |
Rémi Munos | 15 | 2240 | 157.06 |