Abstract | ||
---|---|---|
Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects. We consider Diplomacy, a 7-player board game designed to accentuate dilemmas resulting from many-agent interactions. It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms. We propose a simple yet effective approximate best response operator, designed to handle large combinatorial action spaces and simultaneous moves. We also introduce a family of policy iteration methods that approximate fictitious play. With these methods, we successfully apply RL to Diplomacy: we show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements. |
Year | Venue | DocType |
---|---|---|
2020 | NIPS 2020 | Conference |
Volume | Citations | PageRank |
33 | 0 | 0.34 |
References | Authors | |
0 | 13 |
Name | Order | Citations | PageRank |
---|---|---|---|
Anthony, Thomas | 1 | 23 | 3.03 |
Tom Eccles | 2 | 17 | 5.77 |
Andrea Tacchetti | 3 | 138 | 9.57 |
János Kramár | 4 | 63 | 4.26 |
Ian M. Gemp | 5 | 16 | 6.37 |
Thomas C. Hudson | 6 | 404 | 37.29 |
Porcel Nicolas | 7 | 0 | 0.34 |
Marc Lanctot | 8 | 2121 | 97.97 |
Julien Perolat | 9 | 75 | 12.64 |
Everett Richard | 10 | 0 | 0.68 |
Satinder P. Singh | 11 | 5508 | 715.52 |
Thore Graepel | 12 | 4211 | 242.71 |
Yoram Bachrach | 13 | 1262 | 79.07 |