Abstract | ||
---|---|---|
In this paper, we improve speech translation (ST) through effectively leveraging large quantities of unlabeled speech and text data in different and complementary ways. We explore both pretraining and self-training by using the large Libri-Light speech audio corpus and language modeling with CommonCrawl. Our experiments improve over the previous state of the art by 2.6 BLEU on average on all four considered CoVoST 2 language pairs via a simple recipe of combining wav2vec 2.0 pretraining, a single iteration of self-training and decoding with a language model. Different to existing work, our approach does not leverage any other supervision than ST data. Code and models will be publicly released. |
Year | DOI | Venue |
---|---|---|
2021 | 10.21437/Interspeech.2021-1912 | Interspeech |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Changhan Wang | 1 | 0 | 2.37 |
Anne Wu | 2 | 0 | 0.68 |
Juan Pino | 3 | 21 | 12.63 |
Alexei Baevski | 4 | 85 | 9.52 |
Michael Auli | 5 | 0 | 1.69 |
Alexis Conneau | 6 | 342 | 15.03 |