Title | ||
---|---|---|
Advances In Joint Ctc-Attention Based End-To-End Speech Recognition With A Deep Cnn Encoder And Rnn-Lm |
Abstract | ||
---|---|---|
We present a state-of-the-art end-to-end Automatic Speech Recognition (ASR) model. We learn to listen and write characters with a joint Connectionist Temporal Classification (CTC) and attention-based encoder-decoder network. The encoder is a deep Convolutional Neural Network (CNN) based on the VGG network. The CTC network sits on top of the encoder and is jointly trained with the attention-based decoder. During the beam search process, we combine the CTC predictions. the attention-based decoder predictions and a separately trained LSTM language model. We achieve a 5-10% error reduction compared to prior systems on spontaneous Japanese and Chinese speech, and our end-to-end model beats out traditional hybrid ASR systems. |
Year | DOI | Venue |
---|---|---|
2017 | 10.21437/Interspeech.2017-1296 | 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION |
Keywords | DocType | Volume |
end-to-end speech recognition, encoder-decoder, connectionist temporal classification, attention model | Conference | abs/1706.02737 |
ISSN | Citations | PageRank |
2308-457X | 28 | 1.19 |
References | Authors | |
11 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Takaaki Hori | 1 | 408 | 45.58 |
Shinji Watanabe | 2 | 1158 | 139.38 |
Yu Zhang | 3 | 442 | 41.79 |
William Chan | 4 | 357 | 24.67 |