Title | ||
---|---|---|
Improved End-To-End Speech Recognition Using Adaptive Per-Dimensional Learning Rate Methods |
Abstract | ||
---|---|---|
The introduction of deep neural networks (DNNs) leads to a significant improvement of the automatic speech recognition (ASR) performance. However, the whole ASR system remains sophisticated due to the dependent on the hidden Markov model (HMM). Recently, a new end-to-end ASR framework, which utilizes recurrent neural networks (RNNs) to directly model context-independent targets with connectionist temporal classification (CTC) objective function, is proposed and achieves comparable results with the hybridHMM/DNN system. In this paper, we investigate per-dimensional learning rate methods, ADAGRAD and ADADELTA included, to improve the recognition of the end-to-end system, based on the fact that the blank symbol used in CTC technique dominates the output and these methods give frequent features small learning rates. Experiment results show that more than 4% relative reduction of word error rate (WER) as well as 5% absolute improvement of label accuracy on the training set are achieved when using ADADELTA, and fewer epochs of training are needed. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1587/transinf.2016SLL0001 | IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS |
Keywords | Field | DocType |
connectionist temporal classification, adaptive perdimensional learning rate method, end-to-end ASR | Computer vision,Pattern recognition,Computer science,End-to-end principle,Speech recognition,Artificial intelligence | Journal |
Volume | Issue | ISSN |
E99D | 10 | 1745-1361 |
Citations | PageRank | References |
0 | 0.34 | 5 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Xuyang Wang | 1 | 2 | 1.38 |
Pengyuan Zhang | 2 | 50 | 19.46 |
Qingwei Zhao | 3 | 80 | 20.70 |
Jielin Pan | 4 | 44 | 18.04 |
Yonghong Yan | 5 | 656 | 114.13 |