Title
Improving bitext word alignments via syntax-based reordering of English
Abstract
We present an improved method for automated word alignment of parallel texts which takes advantage of knowledge of syntactic divergences, while avoiding the need for syntactic analysis of the less resource rich language, and retaining the robustness of syntactically agnostic approaches such as the IBM word alignment models. We achieve this by using simple, easily-elicited knowledge to produce syntax-based heuristics which transform the target language (e.g. English) into a form more closely resembling the source language, and then by using standard alignment methods to align the transformed bitext. We present experimental results under variable resource conditions. The method improves word alignment performance for language pairs such as English-Korean and English-Hindi, which exhibit longer-distance syntactic divergences.
Year
DOI
Venue
2004
10.3115/1219044.1219058
ACL (Poster and Demonstration)
Keywords
Field
DocType
resource rich language,standard alignment method,source language,target language,improving bitext word alignment,syntactic analysis,exhibit longer-distance syntactic divergence,language pair,word alignment performance,syntax-based reordering,ibm word alignment model,automated word alignment
IBM,Computer science,Speech recognition,Robustness (computer science),Heuristics,Natural language processing,Artificial intelligence,Parsing,Syntax
Conference
Volume
Citations 
PageRank 
P04-3
3
0.39
References 
Authors
9
2
Name
Order
Citations
PageRank
Elliott Franco Drábek120616.02
David Yarowsky23986618.81