Data augmentation for low-resource languages NMT guided by constrained sampling - Citegraph

Paper Info

Title
Data augmentation for low-resource languages NMT guided by constrained sampling

Abstract
Data augmentation (DA) is a ubiquitous approach for several text generation tasks. Intuitively, in the machine translation paradigm, especially in low-resource languages scenario, many DA methods have appeared. The most commonly used methods are building pseudocorpus by randomly sampling, omitting, or replacing some words in the text. However, previous approaches hardly guarantee the quality of augmented data. In this study, we try to augment the corpus by introducing a constrained sampling method. Additionally, we also build the evaluation framework to select higher quality data after augmentation. Namely, we use the discriminator submodel to mitigate syntactic and semantic errors to some extent. Experimental results show that our augmentation method consistently outperforms all the previous state-of-the-art methods on both small and large-scale corpora in eight language pairs from four corpora by 2.38-4.18 bilingual evaluation understudy points.

Year	DOI	Venue
2022	10.1002/int.22616	INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS
Keywords	DocType	Volume
artificial intelligence, constrained sampling, data augmentation, low-resource languages, natural language processing, neural machine translation	Journal	37
Issue	ISSN	Citations
1	0884-8173	0
PageRank	References	Authors
0.34	0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Mieradilijiang Maimaiti	1	0	2.03
Yang Liu	2	0	0.34
Huanbo Luan	3	0	0.34
Maosong Sun	4	2293	162.86

1