Abstract | ||
---|---|---|
Massive amounts of data for data mining consist of natural language data. A challenge in natural language is to translate the data into a particular language. Machine translation can do the translation automatically. However, the models trained on data from a domain tend to perform poorly for different domains. One way to resolve this issue is to train domain adaptation translation and language models. In this work, we use visualizations to analyze the similarities of domains and explore domain detection methods by using text clustering and domain language models to discover the domain of the test data. Furthermore, we present domain adaptation language models based on tunable discounting mechanism and domain interpolation. Across-domain evaluation of the language models is performed based on perplexity, in which considerable improvements are obtained. The performance of the domain adaptation models are also evaluated in Chinese-to-English machine translation tasks. The experimental BLEU scores indicate that the domain adaptation system significantly outperforms the baseline especially in domain adaptation scenarios. |
Year | DOI | Venue |
---|---|---|
2015 | 10.3233/IFS-151981 | JOURNAL OF INTELLIGENT & FUZZY SYSTEMS |
Keywords | Field | DocType |
Text clustering,domain detection,domain adaptation,language models,machine translation | Perplexity,Computer science,Document clustering,Interpolation,Machine translation,Natural language,Artificial intelligence,Test data,Natural language processing,Transfer-based machine translation,Machine learning,Language model | Journal |
Volume | Issue | ISSN |
29 | 6 | 1064-1246 |
Citations | PageRank | References |
0 | 0.34 | 16 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Junfei Guo | 1 | 7 | 3.01 |
Juan Liu | 2 | 1128 | 145.32 |
Qi Han | 3 | 11 | 4.90 |
Xianlong Chen | 4 | 1 | 0.69 |
Yi Zhao | 5 | 1 | 1.04 |