Pre-training on high-resource speech recognition improves low-resource speech-to-text translation. - Citegraph

Paper Info

Title
Pre-training on high-resource speech recognition improves low-resource speech-to-text translation.

Abstract
We present a simple approach to improve direct speech-to-text translation (ST) when the source language is low-resource: we pre-train the model on a high-resource automatic speech recognition (ASR) task, and then fine-tune its parameters for ST. We demonstrate that our approach is effective by pre-training on 300 hours of English ASR data to improve Spanish-English ST from 10.8 to 20.2 BLEU when only 20 hours of Spanish-English ST training data is available. Through an ablation study, we find that the pre-trained encoder (acoustic model) accounts for most of the improvement, which is surprising since the shared language in these tasks is the target language (text), and not the source language (audio). Applying this insight, we show that pre-training on ASR helps ST even when the ASR language differs from both source and target ST languages: pre-training on French ASR also improves Spanish-English ST. Finally, we show that the approach improves a true low-resource task: pre-training on a combination of English ASR and French ASR improves Mboshi-French ST, where only 4 hours of data are available, from 3.5 to 7.1 BLEU.

Year	DOI	Venue
2018	10.18653/v1/n19-1006	north american chapter of the association for computational linguistics
Field	DocType	Volume
Training set,Computer science,Speech recognition,Natural language processing,Encoder,Artificial intelligence,Acoustic model	Journal	abs/1809.01431
Citations	PageRank	References
1	0.36	23
Authors
5

Authors (5 rows)

Cited by (1 rows)

References (23 rows)

Name	Order	Citations	PageRank
Sameer Bansal	1	3	1.07
Herman Kamper	2	150	20.70
Karen Livescu	3	1254	71.43
Adam Lopez	4	538	34.69
Sharon Goldwater	5	1437	103.96

1