Title
Pretraining by Backtranslation for End-to-End ASR in Low-Resource Settings
Abstract
We explore training attention-based encoder-decoder ASR in low-resource settings. These models perform poorly when trained on small amounts of transcribed speech, in part because they depend on having sufficient target-side text to train the attention and decoder networks. In this paper we address this shortcoming by pretraining our network parameters using only text-based data and transcribed speech from other languages. We analyze the relative contributions of both sources of data. Across 3 test languages, our text-based approach resulted in a 20% average relative improvement over a text-based augmentation technique without pretraining. Using transcribed speech from nearby languages gives a further 20-30% relative reduction in character error rate.
Year
DOI
Venue
2019
10.21437/Interspeech.2019-3254
INTERSPEECH
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
6
Name
Order
Citations
PageRank
Matthew Wiesner152.85
Adithya Renduchintala211.74
Shinji Watanabe31158139.38
Chunxi Liu4233.28
N. Dehak5126992.64
Sanjeev Khudanpur62155202.00