Title | ||
---|---|---|
Disentangling Dynamics and Returns: Value Function Decomposition with Future Prediction. |
Abstract | ||
---|---|---|
Value functions are crucial for model-free Reinforcement Learning (RL) to obtain a policy implicitly or guide the policy updates. Value estimation heavily depends on the stochasticity of environmental dynamics and the quality of reward signals. In this paper, we propose a two-step understanding of value estimation from the perspective of future prediction, through decomposing the value function into a reward-independent future dynamics part and a policy-independent trajectory return part. We then derive a practical deep RL algorithm from the above decomposition, consisting of a convolutional trajectory representation model, a conditional variational dynamics model to predict the expected representation of future trajectory and a convex trajectory return model that maps a trajectory representation to its return. Our algorithm is evaluated in MuJoCo continuous control tasks and shows superior results under both common settings and delayed reward settings. |
Year | Venue | DocType |
---|---|---|
2019 | arXiv: Learning | Journal |
Volume | Citations | PageRank |
abs/1905.11100 | 0 | 0.34 |
References | Authors | |
0 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hongyao Tang | 1 | 1 | 2.72 |
Jianye Hao | 2 | 189 | 55.78 |
guangyong chen | 3 | 41 | 8.39 |
Pengfei Chen | 4 | 82 | 5.30 |
Zhaopeng Meng | 5 | 20 | 3.10 |
Yaodong Yang | 6 | 0 | 1.01 |
Li Wang | 7 | 250 | 56.88 |