Abstract | ||
---|---|---|
We present a new method for learning good strategies in zero-sum Markov games in which each side is composed of multiple agents col- laborating against an opposing team of agents. Our method requires full observability and communication during learning, but the learned poli- cies can be executed in a distributed manner. The value function is rep- resented as a factored linear architecture and its structure determines the necessary computational resources and communication bandwidth. This approach permits a tradeoff between simple representations with little or no communication between agents and complex, computationally inten- sive representations with extensive coordination between agents. Thus, we provide a principled means of using approximation to combat the exponential blowup in the joint action space of the participants. The ap- proach is demonstrated with an example that shows the efficiency gains over naive enumeration. |
Year | Venue | Keywords |
---|---|---|
2002 | NIPS | col,value function |
Field | DocType | Citations |
Architecture,Observability,Exponential function,Computer science,Enumeration,Markov chain,Bellman equation,Communication bandwidth,Artificial intelligence,Machine learning | Conference | 5 |
PageRank | References | Authors |
0.49 | 6 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Michail G. Lagoudakis | 1 | 1164 | 79.51 |
Ronald Parr | 2 | 2428 | 186.85 |