Abstract | ||
---|---|---|
Inverse reinforcement learning (IRL) is the problem of learning the preferences of an agent from the observations of its behavior on a task. While this problem has been well investigated, the related problem of {em online} IRL---where the observations are incrementally accrued, yet the demands of the application often prohibit a full rerun of an IRL method---has received relatively less attention. We introduce the first formal framework for online IRL, called incremental IRL (I2RL), and a new method that advances maximum entropy IRL with hidden variables, to this setting. Our formal analysis shows that the new method has a monotonically improving performance with more demonstration data, as well as probabilistically bounded error, both under full and partial observability. Experiments in a simulated robotic application of penetrating a continuous patrol under occlusion shows the relatively improved performance and speed up of the new method and validates the utility of online IRL. |
Year | Venue | Field |
---|---|---|
2018 | arXiv: Learning | Monotonic function,Mathematical optimization,Observability,Inverse reinforcement learning,Hidden variable theory,Principle of maximum entropy,Bounded error,Mathematics,Speedup |
DocType | Volume | Citations |
Journal | abs/1805.07871 | 0 |
PageRank | References | Authors |
0.34 | 10 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Saurabh Arora | 1 | 0 | 0.34 |
Prashant Doshi | 2 | 926 | 90.23 |
Bikramjit Banerjee | 3 | 284 | 32.63 |