Abstract | ||
---|---|---|
Monte Carlo planning has been proven successful in many sequential decision-making settings, but it suffers from poor exploration when the rewards are sparse. In this paper, we improve exploration in UCT by generalizing across similar states using a given distance metric. When the state space does not have a natural distance metric, we show how we can learn a local manifold from the transition graph of states in the near future. to obtain a distance metric. On domains inspired by video games, empirical evidence shows that our algorithm is more sample efficient than UCT, particularly when rewards are sparse. |
Year | Venue | DocType |
---|---|---|
2015 | PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE | Conference |
Citations | PageRank | References |
3 | 0.39 | 8 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Sriram Srinivasan | 1 | 379 | 27.92 |
Talvitie, Erik | 2 | 148 | 10.11 |
Michael H. Bowling | 3 | 2460 | 205.07 |