Sample Selection for Large-scale MT Discriminative Training. - Citegraph

Paper Info

Title
Sample Selection for Large-scale MT Discriminative Training.

Abstract
Discriminative training for MT usually involves numerous features and requires largescale training set to reach reliable parameter estimation. Other than using the expensive human-labeled parallel corpora for training, semi-supervised methods have been proposed to generate huge amount of “hallucinated” data which relieves the data sparsity problem. However the large training set contains both good samples which are suitable for training and bad ones harmful to the training. How to select training samples from vast amount of data can greatly affect the training performance. In this paper we propose a method for selecting samples that are most suitable for discriminative training according to a criterion measuring the dataset quality. Our experimental results show that by adding samples to the training set selectively, we are able to exceed the performance of system trained with the same amount of samples selected randomly.

Year	Venue	DocType
2020	AMTA	Conference
Citations	PageRank	References
0	0.34	0
Authors
2

Authors (2 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Yuan Cao	1	548	35.60
Sanjeev Khudanpur	2	2155	202.00

1