Abstract | ||
---|---|---|
In this paper, we propose a refined multi-stage multi-task training strategy to improve the performance of onli
<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ne</sup>
attention-based encoder-decoder (AED) models. A three-stage training based on three levels of architectural granularity namely, character encoder, byte pair encoding (BPE) based encoder, and attention decoder, is proposed. Also, multi-task learning based on two-levels of linguistic granularity namely, character and BPE, is used. We explore different pre-training strategies for the encoders including transfer learning from a bidirectional encoder. Our encoder-decoder models with online attention show ~35% and ~10% relative improvement over their baselines for smaller and bigger models, respectively. Our models achieve a word error rate (WER) of 5.04% and 4.48% on the Librispeech test-clean data for the smaller and bigger models respectively after fusion with long short-term memory (LSTM) based external language model (LM). |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/ASRU46091.2019.9003936 | 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) |
Keywords | DocType | ISBN |
Attention based encoder-decoder models,online attention,multi-stage training,multi-task learning | Conference | 978-1-7281-0307-5 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Abhinav Garg | 1 | 6 | 6.61 |
Dhananjaya Gowda | 2 | 3 | 5.47 |
Ankur N Kumar | 3 | 8 | 3.39 |
Kwangyoun Kim | 4 | 2 | 4.11 |
Mehul Kumar | 5 | 1 | 2.73 |
Chanwoo Kim | 6 | 253 | 28.44 |