Advances In Joint Ctc-Attention Based End-To-End Speech Recognition With A Deep Cnn Encoder And Rnn-Lm - Citegraph

Paper Info

Title
Advances In Joint Ctc-Attention Based End-To-End Speech Recognition With A Deep Cnn Encoder And Rnn-Lm

Abstract
We present a state-of-the-art end-to-end Automatic Speech Recognition (ASR) model. We learn to listen and write characters with a joint Connectionist Temporal Classification (CTC) and attention-based encoder-decoder network. The encoder is a deep Convolutional Neural Network (CNN) based on the VGG network. The CTC network sits on top of the encoder and is jointly trained with the attention-based decoder. During the beam search process, we combine the CTC predictions. the attention-based decoder predictions and a separately trained LSTM language model. We achieve a 5-10% error reduction compared to prior systems on spontaneous Japanese and Chinese speech, and our end-to-end model beats out traditional hybrid ASR systems.

Year	DOI	Venue
2017	10.21437/Interspeech.2017-1296	18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION
Keywords	DocType	Volume
end-to-end speech recognition, encoder-decoder, connectionist temporal classification, attention model	Conference	abs/1706.02737
ISSN	Citations	PageRank
2308-457X	28	1.19
References	Authors
11	4

Authors (4 rows)

Cited by (28 rows)

References (11 rows)

Name	Order	Citations	PageRank
Takaaki Hori	1	408	45.58
Shinji Watanabe	2	1158	139.38
Yu Zhang	3	442	41.79
William Chan	4	357	24.67

1