Abstract | ||
---|---|---|
Reinforcement learning (RL) in partially observable settings is challenging because the agent's observations are not Markov. Recently proposed methods can learn variable-order Markov models of the underlying process but have steep memory requirements and are sensitive to aliasing between observation histories due to sensor noise. This paper proposes dynamic-depth context tree weighting (D2-CTW), a model-learning method that addresses these limitations. D2-CTW dynamically expands a suffix tree while ensuring that the size of the model, but not its depth, remains bounded. We show that D2-CTW approximately matches the performance of state-of-the-art alternatives at stochastic time-series prediction while using at least an order of magnitude less memory. We also apply D2-CTW to model-based RL, showing that, on tasks that require memory of past observations, D2-CTW can learn without prior knowledge of a good state representation, or even the length of history upon which such a representation should depend. |
Year | Venue | Field |
---|---|---|
2017 | ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017) | State representation,Observable,Markov model,Computer science,Algorithm,Context tree weighting,Aliasing,Artificial intelligence,Suffix tree,Machine learning,Bounded function,Reinforcement learning |
DocType | Volume | ISSN |
Conference | 30 | 1049-5258 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
João V. Messias | 1 | 26 | 4.77 |
Shimon Whiteson | 2 | 1460 | 99.00 |
Messias, Joao V. | 3 | 0 | 0.34 |