Abstract | ||
---|---|---|
Deep Neural Networks (DNNs) beat the Gaussian Mixture Models (GMMs), and become the state-of-the-art techniques for acoustic model. Then various neural networks based acoustic models are proposed to make the speech recognition systems better and better. However these successes are not adopted in the researches of Mongolian speech recognition. This study fills in this gap. We study a series of neural networks based acoustic models, apply them in the Mongolian speech recognition systems, and compare their performance. We find out the Long Short-Term Memory (LSTM) is the best model among them. Finally, by using the LSTM acoustic model together with data augmentation technique, which uses various combinations of Vocal Tract Length Normalization (VTLN) warping factor and time-warping factor to artificially expand the amount of data, we refresh the recode of the Mongolian speech recognition. Compared with the best DNN-based speech recognition system, we cut the Word Error Rate (WER) nearly by half. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1109/IALP.2016.7875921 | 2016 International Conference on Asian Language Processing (IALP) |
Keywords | Field | DocType |
Neural Network (NN),Mongolian speech recognition,Long Short-Term Memory (LSTM),Vocal Tract Length Normalization (VTLN) | Pattern recognition,Computer science,Word error rate,Speech recognition,Time delay neural network,Speaker recognition,Artificial intelligence,Hidden Markov model,Artificial neural network,Mixture model,Vocal tract,Acoustic model | Conference |
ISSN | ISBN | Citations |
2159-1962 | 978-1-5090-0923-7 | 0 |
PageRank | References | Authors |
0.34 | 0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hongwei Zhang | 1 | 3 | 3.54 |
Fei Long | 2 | 16 | 13.09 |
Guanglai Gao | 3 | 78 | 24.57 |
Hui Zhang | 4 | 13 | 6.39 |