Title
Integrating Knowledge Encoded by Linguistic Phenomena of Indian Languages with Neural Machine Translation.
Abstract
Machine Translation (MT) among Indian languages is a challenging problem, owing to multiple factors including their morphological complexity and diversity, in addition to lack of sufficient parallel data for most language pairs. Neural Machine Translation (NMT) is a rapidly advancing MT paradigm and has shown promising results for many language pairs, especially in large training data scenario. We build 110 NMT systems for translation among 11 Indian languages - the first effort in the direction of NMT for Indian languages to the best of our knowledge. Also, since the condition of large parallel corpora is not met for most Indian languages, we propose a method to employ additional linguistic knowledge which is encoded by different phenomena depicted by Indian languages; like Vibhakti, Sandhi and so on. We compare the results obtained on incorporating this knowledge with the baseline systems and demonstrate significant performance improvement. We observe that although NMT models have a strong efficacy to learn language constructs, the usage of specific features further help in improving the performance. To summarize, this paper demonstrates the use of NMT techniques for Indian languages, with an emphasis on the incorporation of specific linguistic knowledge to improve translation quality.
Year
Venue
Field
2017
MIKE
Rule-based machine translation,Example-based machine translation,Computer science,Machine translation,Sandhi,Parallel corpora,Natural language processing,Artificial intelligence,Training set,Language construct,Speech recognition,Linguistics,Performance improvement
DocType
Citations 
PageRank 
Conference
1
0.35
References 
Authors
8
3
Name
Order
Citations
PageRank
Ruchit Agrawal110.35
Mihir Shekhar210.35
Dipti Misra Sharma326245.90