Abstract | ||
---|---|---|
Automated speech recognition(ASR)incorporating Neural Networks with Hidden Markov Models (NNs/HMMs)have achieved the state-of-the-art in various benchmarks. Most of them use a large amount of training data. However, ASR research is still quite difficult in languages with limited resources, such as Khalkha Mongolian. Transfer learning methods have been shown to be effective utilizing out-of-domain data to improve ASR performance in similar data-scarce. In this paper, we investigate two different weight transfer approaches to improve the performance of Khalkha Mongolian ASR based on Lattice-free Maximum Mutual Information(LF-MMI). Moreover, the i-vector feature is used to combine with the MFCCs feature as the input to validate the effectiveness of Khalkha Mongolian ASR transfer models. Experimental results show that the weight transfer methods with out-of-domain Chahar speech can achieve great improvements over baseline model on Khalkha speech. And transferring parts of the model performs better than transferring the whole model. Furthermore, the i-vector spliced together with MFCCs as input features can further enhance the performance of the acoustic model. The WER of optimal model is relatively reduced by 10.96% compared with the in-of-domain Khalkha speech baseline model. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/IALP.2018.8629237 | 2018 International Conference on Asian Language Processing (IALP) |
Keywords | Field | DocType |
Mongolian,speech recognition,weight transfer | Data modeling,Weight transfer,Computer science,Transfer of learning,Speech recognition,Time delay neural network,Mutual information,Hidden Markov model,Artificial neural network,Acoustic model | Conference |
ISSN | ISBN | Citations |
2159-1962 | 978-1-5386-8298-2 | 0 |
PageRank | References | Authors |
0.34 | 0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Linyan Shi | 1 | 0 | 0.34 |
Fei Long | 2 | 16 | 13.09 |
Yonghe Wang | 3 | 0 | 2.37 |
Guanglai Gao | 4 | 78 | 24.57 |