Abstract | ||
---|---|---|
In the traditional reinforcement learning paradigm, a reward signal is applied to define the goal of the task. Usually, the reward signal is a "hand-crafted" numerical value or a pre-defined function: it tells the agent how good or bad a specific action is. However, we believe there exist situations in which the environment cannot directly provide such a reward signal to the agent. Therefore, the ... |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/MCI.2018.2840727 | IEEE Computational Intelligence Magazine |
Keywords | Field | DocType |
Neural networks,Robot learning,Learning (artificial intelligence),Task analysis,Dynamic programming,Machine learning | Dynamic programming,Inverted pendulum,Task analysis,Computer science,Artificial intelligence,Robot,Artificial neural network,Machine learning,Reinforcement learning | Journal |
Volume | Issue | ISSN |
13 | 3 | 1556-603X |
Citations | PageRank | References |
2 | 0.37 | 0 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Haibo He | 1 | 3653 | 213.96 |
Xiangnan Zhong | 2 | 346 | 16.35 |