Title
Sample Selection for Large-scale MT Discriminative Training.
Abstract
Discriminative training for MT usually involves numerous features and requires largescale training set to reach reliable parameter estimation. Other than using the expensive human-labeled parallel corpora for training, semi-supervised methods have been proposed to generate huge amount of “hallucinated” data which relieves the data sparsity problem. However the large training set contains both good samples which are suitable for training and bad ones harmful to the training. How to select training samples from vast amount of data can greatly affect the training performance. In this paper we propose a method for selecting samples that are most suitable for discriminative training according to a criterion measuring the dataset quality. Our experimental results show that by adding samples to the training set selectively, we are able to exceed the performance of system trained with the same amount of samples selected randomly.
Year
Venue
DocType
2020
AMTA
Conference
Citations 
PageRank 
References 
0
0.34
0
Authors
2
Name
Order
Citations
PageRank
Yuan Cao154835.60
Sanjeev Khudanpur22155202.00