Abstract | ||
---|---|---|
present a modular approach for learning policies for navigation over long planning horizons from language input. Our hierarchical policy operates at multiple timescales, where the higher-level master policy proposes subgoals to be executed by specialized sub-policies. Our choice of subgoals is compositional and semantic, i.e. they can be sequentially combined in arbitrary orderings, and assume human-interpretable descriptions (e.g. u0027exit roomu0027, u0027find kitchenu0027, u0027find refrigeratoru0027, etc.). We use imitation learning to warm-start policies at each level of the hierarchy, dramatically increasing sample efficiency, followed by reinforcement learning. Independent reinforcement learning at each level of hierarchy enables sub-policies to adapt to consequences of their actions and recover from errors. Subsequent joint hierarchical training enables the master policy to adapt to the sub-policies. |
Year | Venue | DocType |
---|---|---|
2018 | CoRL | Journal |
Volume | Citations | PageRank |
abs/1810.11181 | 9 | 0.44 |
References | Authors | |
29 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Abhishek Das | 1 | 433 | 23.54 |
Georgia Gkioxari | 2 | 420 | 31.64 |
Stefan Lee | 3 | 231 | 19.88 |
Devi Parikh | 4 | 2929 | 132.01 |
Dhruv Batra | 5 | 2142 | 104.81 |