Title | ||
---|---|---|
Integrating Knowledge Encoded by Linguistic Phenomena of Indian Languages with Neural Machine Translation. |
Abstract | ||
---|---|---|
Machine Translation (MT) among Indian languages is a challenging problem, owing to multiple factors including their morphological complexity and diversity, in addition to lack of sufficient parallel data for most language pairs. Neural Machine Translation (NMT) is a rapidly advancing MT paradigm and has shown promising results for many language pairs, especially in large training data scenario. We build 110 NMT systems for translation among 11 Indian languages - the first effort in the direction of NMT for Indian languages to the best of our knowledge. Also, since the condition of large parallel corpora is not met for most Indian languages, we propose a method to employ additional linguistic knowledge which is encoded by different phenomena depicted by Indian languages; like Vibhakti, Sandhi and so on. We compare the results obtained on incorporating this knowledge with the baseline systems and demonstrate significant performance improvement. We observe that although NMT models have a strong efficacy to learn language constructs, the usage of specific features further help in improving the performance. To summarize, this paper demonstrates the use of NMT techniques for Indian languages, with an emphasis on the incorporation of specific linguistic knowledge to improve translation quality. |
Year | Venue | Field |
---|---|---|
2017 | MIKE | Rule-based machine translation,Example-based machine translation,Computer science,Machine translation,Sandhi,Parallel corpora,Natural language processing,Artificial intelligence,Training set,Language construct,Speech recognition,Linguistics,Performance improvement |
DocType | Citations | PageRank |
Conference | 1 | 0.35 |
References | Authors | |
8 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ruchit Agrawal | 1 | 1 | 0.35 |
Mihir Shekhar | 2 | 1 | 0.35 |
Dipti Misra Sharma | 3 | 262 | 45.90 |