Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component Level Mixture Modelling. - Citegraph

Paper Info

Title
Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component Level Mixture Modelling.

Abstract
This paper reports experiments on adapting components of a Statistical Machine Translation (SMT) system for the task of translating online user-generated forum data from Symantec. Such data is monolingual, and differs from available bitext MT training resources in a number of important respects. For this reason, adaptation techniques are important to achieve optimal results. We investigate the use of mixture modelling to adapt our models for this specific task. Individual models, created from different in-domain and out-of-domain data sources, are combined using linear and log-linear weighting methods for the different components of an SMT system. The results show a more profound effect of language model adaptation over translation model adaptation with respect to translation quality. Surprisingly, linear combination outperforms log-linear combination of the models. The best adapted systems provide a statistically significant improvement of 1.78 absolute BLEU points (6.85% relative) and 2.73 absolute BLEU points (8.05% relative) over the baseline system for English–German and English–French, respectively.

Year	Venue	DocType
2011	MTSummit	Conference
Citations	PageRank	References
0	0.34	0
Authors
5

Authors (5 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Pratyush Banerjee	1	52	6.57
Sudip Kumar Naskar	2	210	37.38
Johann Roturier	3	0	2.70
Andy Way	4	881	126.78
Josef Van Genabith	5	0	0.34

1