Abstract | ||
---|---|---|
Collecting data for training dialog systems can be extremely expensive due to the involvement of human participants and need for extensive annotation. Especially in document-grounded dialog systems, human experts need to carefully read the unstructured documents to answer the users' questions. As a result, existing document-grounded dialog datasets are relatively small-scale and obstruct the effective training of dialogue systems. In this paper, we propose an automatic data augmentation technique grounded on documents through a generative dialogue model. The dialogue model consists of a user bot and agent bot that can synthesize diverse dialogues given an input document, which are then used to train a downstream model. When supplementing the original dataset, our method achieves significant improvement over traditional data augmentation methods. We also achieve great performance in the low-resource setting. |
Year | Venue | DocType |
---|---|---|
2022 | SIGdial Meetings (SIGDIAL) | Conference |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Qingyang Wu | 1 | 0 | 2.37 |
Song Feng | 2 | 280 | 19.55 |
Derek Chen | 3 | 0 | 0.68 |
Sachindra Joshi | 4 | 0 | 1.01 |
Luis A. Lastras | 5 | 0 | 0.34 |
zhou yu | 6 | 56 | 9.94 |