Self-Attention Transducers for End-to-End Speech Recognition - Citegraph

Paper Info

Title
Self-Attention Transducers for End-to-End Speech Recognition

Abstract
Recurrent neural network transducers (RNN-T) have been successfully applied in end-to-end speech recognition. However, the recurrent structure makes it difficult for parallelization . In this paper, we propose a self-attention transducer (SA-T) for speech recognition. RNNs are replaced with self-attention blocks, which are powerful to model long-term dependencies inside sequences and able to be efficiently parallelized. Furthermore, a path-aware regularization is proposed to assist SA-T to learn alignments and improve the performance. Additionally, a chunk-flow mechanism is utilized to achieve online decoding. All experiments are conducted on a Mandarin Chinese dataset AISHELL-1. The results demonstrate that our proposed approach achieves a 21.3% relative reduction in character error rate compared with the baseline RNN-T. In addition, the SA-T with chunk-flow mechanism can perform online decoding with only a little degradation of the performance.

Year	DOI	Venue
2019	10.21437/Interspeech.2019-2203	INTERSPEECH
DocType	Citations	PageRank
Conference	1	0.36
References	Authors
0	5

Authors (5 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Zhengkun Tian	1	3	5.79
Jiangyan Yi	2	19	17.99
Jianhua Tao	3	848	138.00
Ye Bai	4	7	5.52
Zhengqi Wen	5	86	24.41

1