Abstract | ||
---|---|---|
The development of high-performance speech processing systems for low-resource languages is a challenging area. One approach to address the lack of resources is to make use of data from multiple languages. A popular direction in recent years is to use bottleneck features, or hybrid systems, trained on multilingual data for speech-to-text (STT) systems. This paper presents an investigation into the application of these multilingual approaches to spoken term detection. Experiments were run using the IARPA Babel limited language pack corpora (~10 hours/language) with 4 languages for initial multilingual system development and an additional held-out target language. STT gains achieved through using multilingual bottleneck features in a Tandem configuration are shown to also apply to keyword search (KWS). Further improvements in both STT and KWS were observed by incorporating language questions into the Tandem GMM-HMM decision trees for the training set languages. Adapted hybrid systems performed slightly worse on average than the adapted Tandem systems. A language independent acoustic model test on the target language showed that retraining or adapting of the acoustic models to the target language is currently minimally needed to achieve reasonable performance. |
Year | DOI | Venue |
---|---|---|
2013 | 10.1109/ASRU.2013.6707719 | Automatic Speech Recognition and Understanding |
Keywords | Field | DocType |
Gaussian processes,decision trees,hidden Markov models,mixture models,natural language processing,neural nets,speech recognition,speech synthesis,IARPA Babel limited language pack corpora,KWS,STT systems,Tandem configuration,high-performance speech processing systems,hybrid systems,initial multilingual system development,keyword search,language independent acoustic model test,low-resource languages,multilingual bottleneck features,multilingual deep neural networks,speech-to-text systems,spoken term detection,tandem GMM-HMM decision trees,training set languages,Multilingual,keyword search,neural networks,speech recognition,spoken term detection | Speech processing,Decision tree,Bottleneck,Speech synthesis,Computer science,Speech recognition,Natural language processing,Artificial intelligence,Hidden Markov model,Artificial neural network,Hybrid system,Acoustic model | Conference |
Citations | PageRank | References |
25 | 0.89 | 18 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kate Knill | 1 | 249 | 28.02 |
Mark J. F. Gales | 2 | 3905 | 367.45 |
Shakti P. Rath | 3 | 45 | 2.61 |
Philip C. Woodland | 4 | 4097 | 488.66 |
Chao Zhang | 5 | 95 | 9.70 |
Shixiong Zhang | 6 | 107 | 9.34 |