Learning to Play No-Press Diplomacy with Best Response Policy Iteration - Citegraph

Paper Info

Title
Learning to Play No-Press Diplomacy with Best Response Policy Iteration

Abstract
Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects. We consider Diplomacy, a 7-player board game designed to accentuate dilemmas resulting from many-agent interactions. It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms. We propose a simple yet effective approximate best response operator, designed to handle large combinatorial action spaces and simultaneous moves. We also introduce a family of policy iteration methods that approximate fictitious play. With these methods, we successfully apply RL to Diplomacy: we show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.

Year	Venue	DocType
2020	NIPS 2020	Conference
Volume	Citations	PageRank
33	0	0.34
References	Authors
0	13

Authors (13 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Anthony, Thomas	1	23	3.03
Tom Eccles	2	17	5.77
Andrea Tacchetti	3	138	9.57
János Kramár	4	63	4.26
Ian M. Gemp	5	16	6.37
Thomas C. Hudson	6	404	37.29
Porcel Nicolas	7	0	0.34
Marc Lanctot	8	2121	97.97
Julien Perolat	9	75	12.64
Everett Richard	10	0	0.68
Satinder P. Singh	11	5508	715.52
Thore Graepel	12	4211	242.71
Yoram Bachrach	13	1262	79.07

1