Investigation of multilingual deep neural networks for spoken term detection - Citegraph

Paper Info

Title
Investigation of multilingual deep neural networks for spoken term detection

Abstract
The development of high-performance speech processing systems for low-resource languages is a challenging area. One approach to address the lack of resources is to make use of data from multiple languages. A popular direction in recent years is to use bottleneck features, or hybrid systems, trained on multilingual data for speech-to-text (STT) systems. This paper presents an investigation into the application of these multilingual approaches to spoken term detection. Experiments were run using the IARPA Babel limited language pack corpora (~10 hours/language) with 4 languages for initial multilingual system development and an additional held-out target language. STT gains achieved through using multilingual bottleneck features in a Tandem configuration are shown to also apply to keyword search (KWS). Further improvements in both STT and KWS were observed by incorporating language questions into the Tandem GMM-HMM decision trees for the training set languages. Adapted hybrid systems performed slightly worse on average than the adapted Tandem systems. A language independent acoustic model test on the target language showed that retraining or adapting of the acoustic models to the target language is currently minimally needed to achieve reasonable performance.

Year	DOI	Venue
2013	10.1109/ASRU.2013.6707719	Automatic Speech Recognition and Understanding
Keywords	Field	DocType
Gaussian processes,decision trees,hidden Markov models,mixture models,natural language processing,neural nets,speech recognition,speech synthesis,IARPA Babel limited language pack corpora,KWS,STT systems,Tandem configuration,high-performance speech processing systems,hybrid systems,initial multilingual system development,keyword search,language independent acoustic model test,low-resource languages,multilingual bottleneck features,multilingual deep neural networks,speech-to-text systems,spoken term detection,tandem GMM-HMM decision trees,training set languages,Multilingual,keyword search,neural networks,speech recognition,spoken term detection	Speech processing,Decision tree,Bottleneck,Speech synthesis,Computer science,Speech recognition,Natural language processing,Artificial intelligence,Hidden Markov model,Artificial neural network,Hybrid system,Acoustic model	Conference
Citations	PageRank	References
25	0.89	18
Authors
6

Authors (6 rows)

Cited by (25 rows)

References (18 rows)

Name	Order	Citations	PageRank
Kate Knill	1	249	28.02
Mark J. F. Gales	2	3905	367.45
Shakti P. Rath	3	45	2.61
Philip C. Woodland	4	4097	488.66
Chao Zhang	5	95	9.70
Shixiong Zhang	6	107	9.34

1