Title
Speech recognition in a dialog system: from conventional to deep processing - A case study applied to Spanish.
Abstract
The aim of this paper is to illustrate an overview of the automatic speech recognition (ASR) module in a spoken dialog system and how it has evolved from the conventional GMM-HMM (Gaussian mixture model - hidden Markov model) architecture toward the recent nonlinear DNN-HMM (deep neural network) scheme. GMMs have dominated for a long time the baseline of speech recognition, but in the past years with the resurgence of artificial neural networks (ANNs), the former models have been surpassed in most recognition tasks. An outstanding consideration for ANNs-based acoustic model is the fact that their weights can be adjusted in two training steps: i) initialization of the weights (with or without pre-training) and ii) fine-tuning. To exemplify these frameworks, a case study is realized by using the Kaldi toolkit, employing a mid-vocabulary with a personalized speaker-independent voice corpus on a connected-words phone dialing environment operated for recognition of digit strings and personal name lists in Spanish from Mexico. The obtained results show a reasonable accuracy in the speech recognition performance through the DNN acoustic modeling. A word error rate (WER) of 1.49% for context-dependent DNN-HMM is achieved, providing a 30% relative improvement with regard to the best GMM-HMM result in these experiments (2.12% WER).
Year
Venue
Field
2018
Multimedia Tools Appl.
Computer science,Word error rate,Speech recognition,Speaker recognition,Dialog system,Natural language processing,Artificial intelligence,Deep learning,Hidden Markov model,Artificial neural network,Mixture model,Acoustic model
DocType
Volume
Issue
Journal
77
12
Citations 
PageRank 
References 
1
0.35
41
Authors
3
Name
Order
Citations
PageRank
Aldonso Becerra121.43
José Ismael de la Rosa Vargas2105.29
Efrén González332.79