Title | ||
---|---|---|
Statistical Machine Translation From And Into Morphologically Rich And Low Resourced Languages |
Abstract | ||
---|---|---|
In this paper, we consider the challenging problem of automatic machine translation between a language pair which is both morphologically rich and low resourced: Sinhala and Tamil. We build a phrase based Statistical Machine Translation (SMT) system and attempt to enhance it by unsupervised morphological analysis. When translating across this pair of languages, morphological changes result in large numbers of out-of-vocabulary (OOV) terms between training and test sets leading to reduced BLEU scores in evaluation. This early work shows that unsupervised morphological analysis using the Morfessor algorithm, extracting morpheme-like units is able to significantly reduce the OOV problem and help in improved translation. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1007/978-3-319-18111-0_41 | COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT I |
Field | DocType | Volume |
BLEU,Tamil,Computer science,Machine translation,Phrase,Natural language processing,Artificial intelligence,Baseline system | Conference | 9041 |
ISSN | Citations | PageRank |
0302-9743 | 0 | 0.34 |
References | Authors | |
15 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Randil Pushpananda | 1 | 0 | 1.01 |
Ruvan Weerasinghe | 2 | 0 | 0.34 |
Mahesan Niranjan | 3 | 775 | 120.43 |