Abstract | ||
---|---|---|
The ultimate goal of transfer learning is to reduce labeled data requirements by exploiting a pre-existing embedding model trained for different datasets or tasks. While significant progress has been made in the visual and language domains, the speech community has yet to identify a strategy with wide-reaching applicability across tasks. This paper describes a representation of speech based on an unsupervised triplet-loss objective, which exceeds state-of-the-art performance on a number of transfer learning tasks drawn from the non-semantic speech domain. The embedding is trained on a publicly available dataset, and it is tested on a variety of low-resource downstream tasks, including personalization tasks and medical domain. The model will be publicly released. |
Year | DOI | Venue |
---|---|---|
2020 | 10.21437/Interspeech.2020-1242 | INTERSPEECH |
DocType | Citations | PageRank |
Conference | 8 | 0.50 |
References | Authors | |
0 | 10 |
Name | Order | Citations | PageRank |
---|---|---|---|
Joel Shor | 1 | 55 | 5.47 |
Lorena Álvarez | 2 | 504 | 36.47 |
Maor Ronnie | 3 | 8 | 0.50 |
Oran Lang | 4 | 50 | 2.76 |
Felix de Chaumont Quitry | 5 | 22 | 2.44 |
Marco Tagliasacchi | 6 | 14 | 6.71 |
Tuval Omry | 7 | 8 | 0.50 |
Shavitt Ira | 8 | 8 | 0.50 |
Emanuel Dotan | 9 | 8 | 0.50 |
Haviv Yinnon | 10 | 8 | 0.50 |