Abstract | ||
---|---|---|
There are several challenges while building Automatic Speech Recognition (ASR) system for low resource languages such as Indic languages. One problem is the access to large amounts of training data required to build Acoustic Models (AM) from scratch. In the context of Indian English, another challenge encountered is code-mixing as many Indian speakers are multilingual and exhibit code-mixing in their use of language. Recognizing named entities also poses similar challenges as code-mixing as the entities are often of Hindi origin. In this paper we address the problem of training an AM for Hindi with limited data starting with a well trained English model. We do this in two steps - first we expand the phonesets of the English model to include Hindi phones and train it on samples collected from Indian speakers. We show that this step addresses some of the issues with code-mixing and named entity recognition and also acts as a base model for the second step in which we train a Hindi AM. |
Year | Venue | Keywords |
---|---|---|
2017 | National Conference on Communications NCC | Automatic Speech Recognition,Acoustic Modeling,Low Resource Languages,Bootstrapping |
Field | DocType | Citations |
Training set,Data modeling,Indian English,Hindi,Bootstrapping,Computer science,Speech recognition,Natural language processing,Artificial intelligence,Hidden Markov model,Named-entity recognition | Conference | 0 |
PageRank | References | Authors |
0.34 | 5 | 8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Prabhat Pandey | 1 | 0 | 0.34 |
Praful Hebbar | 2 | 1 | 0.68 |
Prashant Borole | 3 | 1 | 0.68 |
Sandeep Satpal | 4 | 44 | 1.35 |
Raveesh Motlani | 5 | 0 | 0.34 |
Rupesh Mehta | 6 | 84 | 4.96 |
Niranjan Nayak | 7 | 1 | 1.70 |
Radhakrishnan Srikanth | 8 | 0 | 0.34 |