Title
Alternative Loss Functions in AlphaZero-like Self-play
Abstract
Recently, AlphaZero has achieved outstanding performance in playing Go, Chess, and Shogi. Players in AlphaZero consist of a combination of Monte Carlo Tree Search and a deep neural network, that is trained using self-play. The unified deep neural network has a policy-head and a value-head, and during training, the optimizer minimizes the sum of policy loss and value loss. However, it is not clear if and under which circumstances other formulations of the loss function are better. Therefore, we perform experiments with different combinations of these two minimization targets. In contrast to many recent papers who adopt single run experiments and use the whole history Elo ratings from self-play, we propose to use repeated runs. The results show that this method can describe the training performance quite well within each training run, but there is a high self-play bias, such that it is incomparable among different training runs. Therefore, inspired by the AlphaGo series papers, a self-play bias avoiding performance assessment, final best player Elo rating, is adopted to evaluate the playing strength in a direct competition between the evolved players. For relatively small games, based on this new evaluation method, surprisingly, minimizing only value loss achieves the strongest playing strength in the final best players' round-robin tournament. These results indicate that more research is needed into the relative importance of value function and policy function in small games.
Year
DOI
Venue
2019
10.1109/SSCI44817.2019.9002814
2019 IEEE Symposium Series on Computational Intelligence (SSCI)
Keywords
Field
DocType
AlphaZero-like self-play,loss combination,Elo evaluation
Monte Carlo tree search,Mathematical optimization,Tournament,Computer science,Bellman equation,Minification,Artificial neural network
Conference
ISBN
Citations 
PageRank 
978-1-7281-2486-5
1
0.35
References 
Authors
14
4
Name
Order
Citations
PageRank
Hui Wang110.69
Michael Emmerich2124371.89
Preuss Mike393381.70
Aske Plaat452472.18