Abstract | ||
---|---|---|
We propose a hybrid approach aimed at improving the sample efficiency in goal-directed reinforcement learning. We do this via a two-step mechanism where firstly, we approximate a model from Model-Free reinforcement learning. Then, we leverage this approximate model along with a notion of reachability using Mean First Passage Times to perform Model-Based reinforcement learning. Built on such a novel observation, we design two new algorithms-Mean First Passage Time based Q-Learning (MFPT-Q) and Mean First Passage Time based DYNA (MFPT-DYNA), that have been fundamentally modified from the state-of-the-art reinforcement learning techniques. Preliminary results have shown that our hybrid approaches converge with much fewer iterations than their corresponding state-of-the-art counterparts and therefore requiring much fewer samples and much fewer training trials to converge. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/IROS.2018.8593728 | 2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS) |
DocType | ISSN | Citations |
Journal | 2153-0858 | 0 |
PageRank | References | Authors |
0.34 | 19 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Shoubhik Debnath | 1 | 0 | 0.68 |
Gaurav S. Sukhatme | 2 | 5469 | 548.13 |
Lantao Liu | 3 | 157 | 16.49 |