Abstract | ||
---|---|---|
Sequence generation with reinforcement learning (RL) has received significant attention recently. However, a challenge with such methods is the sparse-reward problem in the RL training process, in which a scalar guiding signal is often only available after an entire sequence has been generated. This type of sparse reward tends to ignore the global structural information of a sequence, causing generation of sequences that are semantically inconsistent. In this paper, we present a model-based RL approach to overcome this issue. Specifically, we propose a novel guider network to model the sequence-generation environment, which can assist next-word prediction and provide intermediate rewards for generator optimization. Extensive experiments show that the proposed method leads to improved performance for both unconditional and conditional sequence-generation tasks. |
Year | Venue | DocType |
---|---|---|
2018 | arXiv: Computation and Language | Journal |
Volume | Citations | PageRank |
abs/1811.00696 | 2 | 0.35 |
References | Authors | |
21 | 8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ruiyi Zhang | 1 | 3 | 2.41 |
Changyou Chen | 2 | 365 | 36.95 |
Zhe Gan | 3 | 319 | 32.58 |
Wenlin Wang | 4 | 51 | 7.06 |
Liqun Chen | 5 | 2082 | 139.89 |
Dinghan Shen | 6 | 108 | 10.37 |
Guoyin Wang | 7 | 24 | 7.38 |
L. Carin | 8 | 4603 | 339.36 |