Title
Statistical Machine Translation From And Into Morphologically Rich And Low Resourced Languages
Abstract
In this paper, we consider the challenging problem of automatic machine translation between a language pair which is both morphologically rich and low resourced: Sinhala and Tamil. We build a phrase based Statistical Machine Translation (SMT) system and attempt to enhance it by unsupervised morphological analysis. When translating across this pair of languages, morphological changes result in large numbers of out-of-vocabulary (OOV) terms between training and test sets leading to reduced BLEU scores in evaluation. This early work shows that unsupervised morphological analysis using the Morfessor algorithm, extracting morpheme-like units is able to significantly reduce the OOV problem and help in improved translation.
Year
DOI
Venue
2015
10.1007/978-3-319-18111-0_41
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT I
Field
DocType
Volume
BLEU,Tamil,Computer science,Machine translation,Phrase,Natural language processing,Artificial intelligence,Baseline system
Conference
9041
ISSN
Citations 
PageRank 
0302-9743
0
0.34
References 
Authors
15
3
Name
Order
Citations
PageRank
Randil Pushpananda101.01
Ruvan Weerasinghe200.34
Mahesan Niranjan3775120.43