Title
Labeling hierarchical phrase-based models without linguistic resources.
Abstract
Long-range word order differences are a well-known problem for machine translation. Unlike the standard phrase-based models which work with sequential and local phrase reordering, the hierarchical phrase-based model (Hiero) embeds the reordering of phrases within pairs of lexicalized context-free rules. This allows the model to handle long range reordering recursively. However, the Hiero grammar works with a single nonterminal label, which means that the rules are combined together into derivations independently and without reference to context outside the rules themselves. Follow-up work explored remedies involving nonterminal labels obtained from monolingual parsers and taggers. As of yet, no labeling mechanisms exist for the many languages for which there are no good quality parsers or taggers. In this paper we contribute a novel approach for acquiring reordering labels for Hiero grammars directly from the word-aligned parallel training corpus, without use of any taggers or parsers. The new labels represent types of alignment patterns in which a phrase pair is embedded within larger phrase pairs. In order to obtain alignment patterns that generalize well, we propose to decompose word alignments into trees over phrase pairs. Beside this labeling approach, we contribute coarse and sparse features for learning soft, weighted label-substitution as opposed to standard substitution. We report extensive experiments comparing our model to two baselines: Hiero and the known syntax augmented machine translation (SAMT) variant, which labels Hiero rules with nonterminals extracted from monolingual syntactic parses. We also test a simplified labeling scheme based on inversion transduction grammar (ITG). For the Chinese–English task we obtain performance improvement up to 1 BLEU point, whereas for the German–English task, where morphology is an issue, a minor (but statistically significant) improvement of 0.2 BLEU points is reported over SAMT. While ITG labeling does give a performance improvement, it remains sometimes suboptimal relative to our proposed labeling scheme.
Year
DOI
Venue
2015
10.1007/s10590-015-9177-0
Machine Translation
Keywords
Field
DocType
Hierarchical statistical machine translation, Reordering, Reordering labels, Soft constraints
Rule-based machine translation,Terminal and nonterminal symbols,Computer science,Machine translation,Phrase,Natural language processing,Artificial intelligence,Syntax,Word order,Speech recognition,Grammar,Parsing,Linguistics
Journal
Volume
Issue
ISSN
29
3-4
1573-0573
Citations 
PageRank 
References 
1
0.35
41
Authors
2
Name
Order
Citations
PageRank
Gideon Maillette de Buy Wenniger1174.77
Khalil Sima'an244350.32