Title
Improving statistical word alignments with morpho-syntactic transformations
Abstract
This paper presents a wide range of statistical word alignment experiments incorporating morphosyntactic information. By means of parallel corpus transformations according to information of POS-tagging, lemmatization or stemming, we explore which linguistic information helps improve alignment error rates. For this, evaluation against a human word alignment reference is performed, aiming at an improved machine translation training scheme which eventually leads to improved SMT performance. Experiments are carried out in a Spanish–English European Parliament Proceedings parallel corpus, both in a large and a small data track. As expected, improvements due to introducing morphosyntactic information are bigger in case of data scarcity, but significant improvement is also achieved in a large data task, meaning that certain linguistic knowledge is relevant even in situations of large data availability.
Year
DOI
Venue
2006
10.1007/11816508_38
FinTAL
Keywords
Field
DocType
data scarcity,morphosyntactic information,morpho-syntactic transformation,large data availability,alignment error rate,improving statistical word alignment,human word alignment reference,parallel corpus,small data track,statistical word alignment experiment,large data task,linguistic information,machine translation
Rule-based machine translation,Lemmatisation,Small data,Computer science,Machine translation,Computational linguistics,Natural language,Natural language processing,Artificial intelligence,Parsing,Syntax
Conference
Volume
ISSN
ISBN
4139
0302-9743
3-540-37334-9
Citations 
PageRank 
References 
5
0.51
17
Authors
8
Name
Order
Citations
PageRank
Adrià de Gispert147235.22
Deepa Gupta27511.91
Maja Popović316913.09
Patrik Lambert427723.36
José B. Mariño551064.66
marcello federico62420179.56
Hermann Ney7141781506.93
Rafael Banchs81208.91