Actor-Critic Policy Optimization in a Large-Scale Imperfect-Information Game - Citegraph

Paper Info

Title
Actor-Critic Policy Optimization in a Large-Scale Imperfect-Information Game

Abstract
The deep policy gradient method has demonstrated promising results in many large-scale games, where the agent learns purely from its own experience. Yet, policy gradient methods with self-play suffer convergence problems to a Nash Equilibrium (NE) in multi-agent situations. Counterfactual regret minimization (CFR) has a convergence guarantee to a NE in 2-player zero-sum games, but it usually needs domain-specific abstractions to deal with large-scale games. Inheriting merits from both methods, in this paper we extend the actor-critic algorithm framework in deep reinforcement learning to tackle a large-scale 2-player zero-sum imperfect-information game, 1-on-1 Mahjong, whose information set size and game length are much larger than poker. The proposed algorithm, named Actor-Critic Hedge (ACH), modifies the policy optimization objective from originally maximizing the discounted returns to minimizing a type of weighted cumulative counterfactual regret. This modification is achieved by approximating the regret via a deep neural network and minimizing the regret via generating self-play policies using Hedge. ACH is theoretically justified as it is derived from a neural-based weighted CFR, for which we prove the convergence to a NE under certain conditions. Experimental results on the proposed 1-on-1 Mahjong benchmark and benchmarks from the literature demonstrate that ACH outperforms related state-of-the-art methods. Also, the agent obtained by ACH defeats a human champion in 1-on-1 Mahjong.

Year	Venue	Keywords
2022	International Conference on Learning Representations (ICLR)	Policy Optimization,Nash Equilibrium,Mahjong AI
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	11

Authors (11 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Haobo Fu	1	5	1.11
Weiming Liu	2	0	1.01
Shuang Wu	3	0	0.68
Yijia Wang	4	0	0.34
Tao Yang	5	5	8.53
Kai Li	6	0	0.34
Junliang Xing	7	1193	63.31
Bin Li	8	782	79.80
Bo Ma	9	0	0.34
Qiang Fu	10	1	4.42
Wei Yang	11	93	27.50

1