Abstract | ||
---|---|---|
Many real-world problems require trading off multiple competing objectives. However, these objectives are often in different units and/or scales, which can make it challenging for practitioners to express numerical preferences over objectives in their native units. In this paper we propose a novel algorithm for multi-objective reinforcement learning that enables setting desired preferences for objectives in a scale-invariant way. We propose to learn an action distribution for each objective, and we use supervised learning to fit a parametric policy to a combination of these distributions. We demonstrate the effectiveness of our approach on challenging high-dimensional real and simulated robotics tasks, and show that setting different preferences in our framework allows us to trace out the space of nondominated solutions. |
Year | Venue | DocType |
---|---|---|
2020 | ICML | Conference |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
10 |
Name | Order | Citations | PageRank |
---|---|---|---|
Abbas Abdolmaleki | 1 | 46 | 12.82 |
Sandy H. Huang | 2 | 67 | 4.65 |
Leonard Hasenclever | 3 | 20 | 5.42 |
M. Neunert | 4 | 65 | 9.95 |
H. Francis Song | 5 | 105 | 5.14 |
Martina Zambelli | 6 | 1 | 1.03 |
Murilo F. Martins | 7 | 1 | 0.69 |
Nicolas Heess | 8 | 1762 | 94.77 |
R. Hadsell | 9 | 1678 | 100.80 |
Martin Riedmiller | 10 | 5655 | 366.29 |