Abstract | ||
---|---|---|
Many planning methods rely on the use of an immediate reward function as a portable and succinct representation of desired behavior. Rewards are often inferred from demonstrated behavior that is assumed to be near-optimal. We examine a framework, Distance Minimization IRL (DM-IRL), for learning reward functions from scores an expert assigns to possibly suboptimal demonstrations. By changing the expert's role from a demonstrator to a judge, DM-IRL relaxes some of the assumptions present in IRL, enabling learning from the scoring of arbitrary demonstration trajectories with unknown transition functions. DM-IRL complements existing IRL approaches by addressing different assumptions about the expert. We show that DM-IRL is robust to expert scoring error and prove that finding a policy that produces maximally informative trajectories for an expert to score is strongly NP-hard. Experimentally, we demonstrate that the reward function DM-IRL learns from an MDP with an unknown transition model can transfer to an agent with known characteristics in a novel environment, and we achieve successful learning with limited available training data. |
Year | Venue | Field |
---|---|---|
2016 | AAAI | Training set,Computer science,Inverse optimal control,Inverse reinforcement learning,Learning from demonstration,Minification,Artificial intelligence,Error-driven learning,Machine learning,Robotics,Reinforcement learning |
DocType | Citations | PageRank |
Conference | 5 | 0.44 |
References | Authors | |
10 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Benjamin Burchfiel | 1 | 5 | 1.46 |
Carlo Tomasi | 2 | 8314 | 679.81 |
Ronald Parr | 3 | 2428 | 186.85 |