Domain adaptation of statistical machine translation with domain-focused web crawling - Citegraph

Paper Info

Title
Domain adaptation of statistical machine translation with domain-focused web crawling

Abstract
In this paper, we tackle the problem of domain adaptation of statistical machine translation (SMT) by exploiting domain-specific data acquired by domain-focused crawling of text from the World Wide Web. We design and empirically evaluate a procedure for automatic acquisition of monolingual and parallel text and their exploitation for system training, tuning, and testing in a phrase-based SMT framework. We present a strategy for using such resources depending on their availability and quantity supported by results of a large-scale evaluation carried out for the domains of environment and labour legislation, two language pairs (English---French and English---Greek) and in both directions: into and from English. In general, machine translation systems trained and tuned on a general domain perform poorly on specific domains and we show that such systems can be adapted successfully by retuning model parameters using small amounts of parallel in-domain data, and may be further improved by using additional monolingual and parallel training data for adaptation of language and translation models. The average observed improvement in BLEU achieved is substantial at 15.30 points absolute.

Year	DOI	Venue
2015	10.1007/s10579-014-9282-3	Language Resources and Evaluation
Keywords	Field	DocType
Domain adaptation,Optimisation,Statistical machine translation,Web crawling	Training set,Crawling,Domain adaptation,Computer science,Evaluation of machine translation,Machine translation,Phrase,Natural language processing,Transfer-based machine translation,Artificial intelligence,Web crawler	Journal
Volume	Issue	ISSN
49	1	1574-020X
Citations	PageRank	References
3	0.39	58
Authors
7

Authors (7 rows)

Cited by (3 rows)

References (58 rows)

Name	Order	Citations	PageRank
Pavel Pecina	1	558	52.31
Antonio Toral	2	47	10.43
Vassilis Papavassiliou	3	120	10.74
Prokopis	4	114	10.95
Aleš Tamchyna	5	115	14.76
Andy Way	6	881	126.78
Josef van Genabith	7	1037	105.64

1