Title
Research on Mongolian Speech Recognition Based on FSMN.
Abstract
Deep Neural Network (DNN) model has been achieved a significant result over the Mongolian speech recognition task, however, compared to Chinese, English or the others, there are still opportunities for further enhancements. This paper presents the first application of Feed-forward Sequential Memory Network (FSMN) for Mongolian speech recognition tasks to model long-term dependency in time series without using recurrent feedback. Furthermore, by modeling the speaker in the feature space, we extract the i-vector features and combine them with the Fbank features as the input to validate their effectiveness in Mongolian ASR tasks. Finally, discriminative training was firstly conducted over the FSMN by using maximum mutual information (MMI) and state-level minimum Bayes risk (sMBR), respectively. The experimental results show that: FSMN possesses better performance than DNN in the Mongolian ASR, and by using i-vector features combined with Fbank features as FSMN input and discriminative training, the word error rate (WER) is relatively reduced by 17.9% compared with the DNN baseline.
Year
DOI
Venue
2017
10.1007/978-3-319-73618-1_21
Lecture Notes in Artificial Intelligence
Keywords
DocType
Volume
Mongolian,Speech recognition,DNN,FSMN,i-vector,Sequence-criterion training
Conference
10619
ISSN
Citations 
PageRank 
0302-9743
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Yonghe Wang102.37
Fei Long21613.09
Hongwei Zhang333.54
Guanglai Gao47824.57