Title
Combining Model-Based Meta-Reasoning and Reinforcement Learning for Adapting Game-Playing Agents
Abstract
Human experience with interactive games will be en- hanced if the software agents that play the game learn from their failures. Techniques such as reinforcement learning provide one way in which these agents may learn from their failures. Model-based meta-reasoning, a technique in which an agent uses a self-model for blame assignment, provides another. This paper eval- uates a framework in which both these approaches are combined. We describe an experimental investigation of a specific task (defending a city) in a computer war strategy game called FreeCiv. Our results indicate that in the task examined, model-based meta-reasoning cou- pled with reinforcement learning enables the agent to learn the task with performance matching that of an ex- pert designed agent and with speed exceeding that of a pure reinforcement learning agent. but not necessarily identify the precise causes or the modi- fications needed to address them. They used reinforcement learning (RL) to complete the partial solutions generated by meta-reasoning: first, the agent used its self-model to lo- calize the needed modifications to specific portions of its task structure, and then used Q-learning within those parts to identify the necessary modifications. In this work, instead of using reinforcement learning to identify the modifications necessary in a task model, we evaluate the hypothesis that model-based meta-reasoning may also be used to identify the appropriate RL space for a specific task. The learning space represented by combi- nations of all possible modifications to an agent's reason- ing and knowledge can be too large for RL to work effi- ciently. One way in which this complexity can be addressed is through the decomposition of the learning problem into a series of smaller sub-problems (e.g. (Dietterich 1998)). This research examines how an agent may localize learning within such a decomposition through the use of model-based meta-reasoning. We evaluate this hypothesis in the con- text of game playing in a highly complex, non-deterministic, partially-observable environment.
Year
Venue
Keywords
2008
AIIDE
software agent,reinforcement learning
Field
DocType
Citations 
Computer science,Blame,Software agent,Multi-agent system,Artificial intelligence,Error-driven learning,Machine learning,Game playing,Reinforcement learning,Learning classifier system
Conference
8
PageRank 
References 
Authors
0.65
11
3
Name
Order
Citations
PageRank
Patrick Ulam1644.78
Joshua Jones2889.93
Ashok K. Goel3972146.58