Title
Investigation of multilingual deep neural networks for spoken term detection
Abstract
The development of high-performance speech processing systems for low-resource languages is a challenging area. One approach to address the lack of resources is to make use of data from multiple languages. A popular direction in recent years is to use bottleneck features, or hybrid systems, trained on multilingual data for speech-to-text (STT) systems. This paper presents an investigation into the application of these multilingual approaches to spoken term detection. Experiments were run using the IARPA Babel limited language pack corpora (~10 hours/language) with 4 languages for initial multilingual system development and an additional held-out target language. STT gains achieved through using multilingual bottleneck features in a Tandem configuration are shown to also apply to keyword search (KWS). Further improvements in both STT and KWS were observed by incorporating language questions into the Tandem GMM-HMM decision trees for the training set languages. Adapted hybrid systems performed slightly worse on average than the adapted Tandem systems. A language independent acoustic model test on the target language showed that retraining or adapting of the acoustic models to the target language is currently minimally needed to achieve reasonable performance.
Year
DOI
Venue
2013
10.1109/ASRU.2013.6707719
Automatic Speech Recognition and Understanding
Keywords
Field
DocType
Gaussian processes,decision trees,hidden Markov models,mixture models,natural language processing,neural nets,speech recognition,speech synthesis,IARPA Babel limited language pack corpora,KWS,STT systems,Tandem configuration,high-performance speech processing systems,hybrid systems,initial multilingual system development,keyword search,language independent acoustic model test,low-resource languages,multilingual bottleneck features,multilingual deep neural networks,speech-to-text systems,spoken term detection,tandem GMM-HMM decision trees,training set languages,Multilingual,keyword search,neural networks,speech recognition,spoken term detection
Speech processing,Decision tree,Bottleneck,Speech synthesis,Computer science,Speech recognition,Natural language processing,Artificial intelligence,Hidden Markov model,Artificial neural network,Hybrid system,Acoustic model
Conference
Citations 
PageRank 
References 
25
0.89
18
Authors
6
Name
Order
Citations
PageRank
Kate Knill124928.02
Mark J. F. Gales23905367.45
Shakti P. Rath3452.61
Philip C. Woodland44097488.66
Chao Zhang5959.70
Shixiong Zhang61079.34