Abstract | ||
---|---|---|
This work presents improvements of a large-scale Arabic to French statistical machine translation system over a period of three years. The development includes better preprocessing, more training data, additional genre-specific tuning for different domains, namely newswire text and broadcast news transcripts, and improved domain-dependent language models. Starting with an early prototype in 2005 that participated in the second CESTA evaluation, the system was further upgraded to achieve favorable BLEU scores of 44.8% for the text and 41.1% for the audio setting. These results are compared to a system based on the freely available Moses toolkit. We show significant gains both in terms of translation quality (up to +1.2% BLEU absolute) and translation speed (up to 16 times faster) for comparable configuration settings. |
Year | Venue | Keywords |
---|---|---|
2008 | SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008 | language model |
Field | DocType | Citations |
Broadcasting,Example-based machine translation,BLEU,Arabic,Computer science,Machine translation,Speech recognition,Machine translation software usability,Preprocessor,Artificial intelligence,Natural language processing,Language model | Conference | 2 |
PageRank | References | Authors |
0.35 | 7 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Sasa Hasan | 1 | 245 | 17.35 |
Hermann Ney | 2 | 14178 | 1506.93 |