An Off-Policy Trust Region Policy Optimization Method With Monotonic Improvement Guarantee for Deep Reinforcement Learning | 0 | 0.34 | 2022 |
A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning. | 0 | 0.34 | 2018 |
Qualitative Measurements of Policy Discrepancy for Return-Based Deep Q-Network | 0 | 0.34 | 2018 |
A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning. | 2 | 0.39 | 2018 |
Two-Bit Networks for Deep Learning on Resource-Constrained Embedded Devices. | 6 | 0.53 | 2017 |