Title
Noising Scheme for Data Augmentation in Automatic Post-Editing.
Abstract
This paper describes POSTECH’s submission to WMT20 for the shared task on Automatic Post-Editing (APE). Our focus is on increasing the quantity of available APE data to overcome the shortage of human-crafted training data. In our experiment, we implemented a noising module that simulates four types of post-editing errors, and we introduced this module into a Transformer-based multi-source APE model. Our noising module implants errors into texts on the target side of parallel corpora during the training phase to make synthetic MT outputs, increasing the entire number of training samples. We also generated additional training data using the parallel corpora and NMT model that were released for the Quality Estimation task, and we used these data to train our APE model. Experimental results on the WMT20 English-German APE data set show improvements over the baseline in terms of both the TER and BLEU scores: our primary submission achieved an improvement of -3.15 TER and +4.01 BLEU, and our contrastive submission achieved an improvement of -3.34 TER and +4.30 BLEU.
Year
Venue
DocType
2020
WMT@EMNLP
Conference
Citations 
PageRank 
References 
0
0.34
0
Authors
5
Name
Order
Citations
PageRank
WonKee Lee102.03
Jaehun Shin201.01
Baikjin Jung301.35
Jihyung Lee401.01
Jong-Hyeok Lee574097.88