Adversarial Policy Gradient for Alternating Markov Games - Citegraph

Paper Info

Title
Adversarial Policy Gradient for Alternating Markov Games

Abstract
Policy gradient reinforcement learning has been applied to two-player alternate-turn zero-sum games, e.g., in AlphaGo, self-play REINFORCE was used to improve the neural net model after supervised learning. In this paper, we emphasize that two-player zero-sum games with alternating turns, which have been previously formulated as Alternating Markov Games (AMGs), are different from standard MDP because of their two-agent nature. We exploit the difference in associated Bellman equations, which leads to different policy iteration algorithms. As policy gradient method is a kind of generalized policy iteration, we show how these differences in policy iteration are reflected in policy gradient for AMGs. We formulate an adversarial policy gradient and discuss potential possibilities for developing better policy gradient methods other than self-play REINFORCE. The core idea is to estimate the minimum rather than the mean for the “critic”. Experimental results on the game of Hex show the modified Monte Carlo policy gradient methods are able to learn better pure neural net policies than the REINFORCE variants. To apply learned neural weights to multiple board sizes Hex, we describe a board-size independent neural net architecture. We show that when combined with search, using a single neural net model, the resulting program consistently beats MoHex 2.0, the state-of-the-art computer Hex player, on board sizes from 9×9 to 13×13.

Year	Venue	Field
2018	international conference on learning representations	Gradient method,Monte Carlo method,Computer science,Markov chain,Supervised learning,Bellman equation,Exploit,Artificial intelligence,Artificial neural network,Machine learning,Reinforcement learning
DocType	Citations	PageRank
Conference	1	0.36
References	Authors
0	3

Authors (3 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Chao Gao	1	42	5.78
Martin Müller	2	549	68.48
Ryan B. Hayward	3	312	44.97

1