Abstract | ||
---|---|---|
Recurrent neural network transducers (RNN-T) have been successfully applied in end-to-end speech recognition. However, the recurrent structure makes it difficult for parallelization . In this paper, we propose a self-attention transducer (SA-T) for speech recognition. RNNs are replaced with self-attention blocks, which are powerful to model long-term dependencies inside sequences and able to be efficiently parallelized. Furthermore, a path-aware regularization is proposed to assist SA-T to learn alignments and improve the performance. Additionally, a chunk-flow mechanism is utilized to achieve online decoding. All experiments are conducted on a Mandarin Chinese dataset AISHELL-1. The results demonstrate that our proposed approach achieves a 21.3% relative reduction in character error rate compared with the baseline RNN-T. In addition, the SA-T with chunk-flow mechanism can perform online decoding with only a little degradation of the performance. |
Year | DOI | Venue |
---|---|---|
2019 | 10.21437/Interspeech.2019-2203 | INTERSPEECH |
DocType | Citations | PageRank |
Conference | 1 | 0.36 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zhengkun Tian | 1 | 3 | 5.79 |
Jiangyan Yi | 2 | 19 | 17.99 |
Jianhua Tao | 3 | 848 | 138.00 |
Ye Bai | 4 | 7 | 5.52 |
Zhengqi Wen | 5 | 86 | 24.41 |