Effective Selection Of Translation Model Training Data - Citegraph

Paper Info

Title
Effective Selection Of Translation Model Training Data

Abstract
Data selection has been demonstrated to be an effective approach to addressing the lack of high-quality bitext for statistical machine translation in the domain of interest. Most current data selection methods solely use language models trained on a small scale in-domain data to select domain-relevant sentence pairs from general-domain parallel corpus. By contrast, we argue that the relevance between a sentence pair and target domain can be better evaluated by the combination of language model and translation model. In this paper, we study and experiment with novel methods that apply translation models into domain-relevant data selection. The results show that our methods outperform previous methods. When the selected sentence pairs are evaluated on an end-to-end MT task, our methods can increase the translation performance by 3 BLEU points.

Year	Venue	Field
2014	PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2	Rule-based machine translation,BLEU,Data selection,Evaluation of machine translation,Computer science,Machine translation,Artificial intelligence,Natural language processing,Language model,Training set,Speech recognition,Sentence,Machine learning
DocType	Volume	Citations
Conference	P14-2	5
PageRank	References	Authors
0.40	15	5

Authors (5 rows)

Cited by (5 rows)

References (15 rows)

Name	Order	Citations	PageRank
Le Liu	1	30	8.45
Yu Hong	2	246	35.44
Hao Liu	3	5	0.40
Xing Wang	4	58	10.07
Jianmin Yao	5	131	16.96

1