Abstract | ||
---|---|---|
A novel optimization approach is proposed for application to policy gradient methods and evolution strategies for reinforcement learning (RL). The procedure uses a computationally efficient Wasserstein natural gradient (WNG) descent that takes advantage of the geometry induced by a Wasserstein penalty to speed optimization. This method follows the recent theme in RL of including a divergence penalty in the objective to establish a trust region. Experiments on challenging tasks demonstrate improvements in both computational cost and performance over advanced baselines. |
Year | Venue | DocType |
---|---|---|
2021 | ICLR | Conference |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ted Moskovitz | 1 | 0 | 1.01 |
Michael Arbel | 2 | 10 | 4.15 |
Ferenc Huszar | 3 | 583 | 22.66 |
Arthur Gretton | 4 | 3638 | 226.18 |