Generalized multiagent learning with performance bound - Citegraph

Paper Info

Title
Generalized multiagent learning with performance bound

Abstract
We present new Multiagent learning (MAL) algorithms with the general philosophy of policy convergence against some classes of opponents but otherwise ensuring high payoffs. We consider a 3-class breakdown of opponent types: (eventually) stationary, self-play and "other" (see Definition 4) agents. We start with ReDVaLeR that can satisfy policy convergence against the first two types and no-regret against the third, but it needs to know the type of the opponents. This serves as a baseline to delineate the difficulty of achieving these goals. We show that a simple modification on ReDVaLeR yields a new algorithm, RV 驴(t), that achieves no-regret payoffs in all games, and convergence to Nash equilibria in self-play (and to best response against eventually stationary opponents--a corollary of no-regret) simultaneously, without knowing the opponent types, but in a smaller class of games than ReDVaLeR . RV 驴(t) effectively ensures the performance of a learner during the process of learning, as opposed to the performance of a learned behavior. We show that the expression for regret of RV 驴(t) can have a slightly better form than those of other comparable algorithms like GIGA and GIGA-WoLF though, contrastingly, our analysis is in continuous time. Moreover, experiments show that RV 驴(t) can converge to an equilibrium in some cases where GIGA, GIGA-WoLF would fail, and to better equilibria where GIGA, GIGA-WoLF converge to undesirable equilibria (coordination games). This important class of coordination games also highlights the key desirability of policy convergence as a criterion for MAL in self-play instead of high average payoffs. To our knowledge, this is also the first successful (guaranteed) attempt at policy convergence of a no-regret algorithm in the Shapley game.

Year	DOI	Venue
2007	10.1007/s10458-007-9013-x	Autonomous Agents and Multi-Agent Systems
Keywords	DocType	Volume
Multiagent reinforcement learning,Game theory	Journal	15
Issue	ISSN	Citations
3	1387-2532	11
PageRank	References	Authors
0.56	27	2

Authors (2 rows)

Cited by (11 rows)

References (27 rows)

Name	Order	Citations	PageRank
Bikramjit Banerjee	1	284	32.63
Jing Peng	2	673	48.37

1