Abstract | ||
---|---|---|
This work addresses the problem of
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">inverse reinforcement learning</italic>
in Markov decision processes where the decision-making agent is
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">risk-sensitive</italic>
. In particular, a risk-sensitive reinforcement learning algorithm with convergence guarantees that makes use of coherent risk metrics and models of human decision-making which have their origins in behavioral psychology and economics is presented. The risk-sensitive reinforcement learning algorithm provides the theoretical underpinning for a gradient-based inverse reinforcement learning algorithm that seeks to minimize a loss function defined on the observed behavior. It is shown that the gradient of the loss function with respect to the model parameters is well defined and computable via a contraction map argument. Evaluation of the proposed technique is performed on a
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Grid World</italic>
example, a canonical benchmark problem. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/TAC.2019.2926674 | IEEE Transactions on Automatic Control |
Keywords | Field | DocType |
Autonomous systems,Markov processes,optimization,reinforcement learning | Inverse,Markov decision process,Inverse reinforcement learning,Artificial intelligence,Behavioral economics,Travel time,Grid,Machine learning,Mathematics,Reinforcement learning | Journal |
Volume | Issue | ISSN |
65 | 3 | 0018-9286 |
Citations | PageRank | References |
2 | 0.38 | 13 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Lillian J. Ratliff | 1 | 87 | 23.32 |
Eric Mazumdar | 2 | 13 | 7.50 |