Title
Data Augmentation By Concatenation For Low-Resource Translation: A Mystery And A Solution
Abstract
In this paper, we investigate the driving factors behind concatenation, a simple but effective data augmentation method for low-resource neural machine translation. Our experiments suggest that discourse context is unlikely the cause for concatenation improving BLEU by about +1 across four language pairs. Instead, we demonstrate that the improvement comes from three other factors unrelated to discourse: context diversity, length diversity, and (to a lesser extent) position shifting.
Year
Venue
DocType
2021
IWSLT 2021: THE 18TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION
Conference
Citations 
PageRank 
References 
0
0.34
0
Authors
3
Name
Order
Citations
PageRank
Toan Nguyen100.68
Kenton Murray200.34
David Chiang32843144.76