Abstract | ||
---|---|---|
DeepMind's recent spectacular success in using deep convolutional neural nets and machine learning to build superhuman level agents-e.g. for Atari games via deep Q-learning and for the game of Go via other deep Reinforcement Learning methods-raises many questions, including to what extent these methods will succeed in other domains. In this paper we consider DQL for the game of Hex: after supervised initializing, we use self-play to train NeuroHex, an 11-layer convolutional neural network that plays Hex on the 13 x 13 board. Hex is the classic two-player alternate-turn stone placement game played on a rhombus of hexagonal cells in which the winner is whomever connects their two opposing sides. Despite the large action and state space, our system trains a Q-network capable of strong play with no search. After two weeks of Q-learning, NeuroHex achieves respective win-rates of 20.4% as first player and 2.1% as second player against a 1-s/move version of MoHex, the current ICGA Olympiad Hex champion. Our data suggests further improvement might be possible with more training time. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1007/978-3-319-57969-6_1 | Communications in Computer and Information Science |
Keywords | Field | DocType |
Optimal Policy,Reinforcement Learning,Gradient Descent,Convolutional Neural Network,Policy Network | Convolutional neural network,Computer science,Olympiad,Q-learning,Champion,Artificial intelligence,Initialization,Artificial neural network,State space,Machine learning,Reinforcement learning | Conference |
Volume | ISSN | Citations |
705 | 1865-0929 | 1 |
PageRank | References | Authors |
0.36 | 4 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kenny Young | 1 | 2 | 1.39 |
Gautham Vasan | 2 | 5 | 1.54 |
Ryan Hayward | 3 | 2 | 1.09 |