Abstract | ||
---|---|---|
Statistical phrase-based machine translation requires no linguistic information beyond word-aligned parallel corpora (Zens et al., 2002; Koehn et al., 2003). Unfortunately, this linguistic agnosticism often produces ungrammatical translations. Syntax, or sentence structure, could provide guidance to phrase-based systems, but the \"non-constituent\" word strings that phrase-based decoders manipulate complicate the use of most recursive syntactic tools. We address these issues by using Combinatory Categorial Grammar, or CCG, (Steedman, 2000), which has a much more flexible notion of constituency, thereby providing more labels for putative non-constituent multiword translation phrases. Using CCG parse charts, we train a syntactic analogue of a lexicalized reordering model by labelling phrase table entries with multiword labels and demonstrate significant improvements in translating between Urdu and English, two language pairs with divergent sentence structure. |
Year | Venue | Keywords |
---|---|---|
2012 | WMT@NAACL-HLT | ccg syntactic reordering model,putative non-constituent multiword translation,recursive syntactic tool,phrase-based system,statistical phrase-based machine translation,multiword label,divergent sentence structure,linguistic agnosticism,linguistic information,phrase-based decoder,ccg parse chart |
Field | DocType | Citations |
Rule-based machine translation,Computer science,Machine translation,Phrase,Combinatory categorial grammar,Artificial intelligence,Natural language processing,Parsing,Linguistics,Syntax,Sentence,Recursion | Conference | 1 |
PageRank | References | Authors |
0.34 | 26 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Dennis N. Mehay | 1 | 3 | 1.06 |
Chris Brew | 2 | 321 | 44.44 |