Abstract | ||
---|---|---|
Many machine translation evaluation metrics have been proposed after the seminal BLEU metric, and many among them have been found to consistently outperform BLEU, demonstrated by their better correlations with human judgment. It has long been the hope that by tuning machine translation systems against these new generation metrics, advances in automatic machine translation evaluation can lead directly to advances in automatic machine translation. However, to date there has been no unambiguous report that these new metrics can improve a state-of-the-art machine translation system over its BLEU-tuned baseline. In this paper, we demonstrate that tuning Joshua, a hierarchical phrase-based statistical machine translation system, with the TESLA metrics results in significantly better human-judged translation quality than the BLEU-tuned baseline. TESLA-M in particular is simple and performs well in practice on large datasets. We release all our implementation under an open source license. It is our hope that this work will encourage the machine translation community to finally move away from BLEU as the unquestioned default and to consider the new generation metrics when tuning their systems. |
Year | Venue | Keywords |
---|---|---|
2011 | EMNLP | machine translation community,human-judged translation quality,automatic machine translation,bleu-tuned baseline,better machine translation,state-of-the-art machine translation system,translation system,machine translation evaluation metrics,tuning machine translation system,automatic machine translation evaluation,new generation metrics,better evaluation metrics |
Field | DocType | Volume |
BLEU,Evaluation of machine translation,Computer science,Machine translation,Machine translation system,Phrase,Human judgment,ROUGE,Natural language processing,Artificial intelligence,Machine learning,License | Conference | D11-1 |
Citations | PageRank | References |
20 | 2.63 | 21 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
chang liu | 1 | 87 | 6.78 |
Daniel Dahlmeier | 2 | 460 | 29.67 |
Hwee Tou Ng | 3 | 4092 | 300.40 |