Abstract | ||
---|---|---|
We introduce XTREME-S, a new benchmark to evaluate universal cross-lingual speech representations in many languages. XTREME-S covers four task families: speech recognition, classification, speech-to-text translation and retrieval. Covering 102 languages from 10+ language families, 3 different domains and 4 task families, XTREME-S aims to simplify multilingual speech representation evaluation, as well as catalyze research in "universal" speech representation learning. This paper describes the new benchmark and establishes the first speech-only and speech-text baselines using XLS-R and mSLAM on all downstream tasks. We motivate the design choices and detail how to use the benchmark. Datasets and fine-tuning scripts are made easily accessible. |
Year | DOI | Venue |
---|---|---|
2022 | 10.21437/INTERSPEECH.2022-10007 | Conference of the International Speech Communication Association (INTERSPEECH) |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 18 |
Name | Order | Citations | PageRank |
---|---|---|---|
Alexis Conneau | 1 | 342 | 15.03 |
Ankur Bapna | 2 | 36 | 8.45 |
Yu Zhang | 3 | 442 | 41.79 |
Min Ma | 4 | 19 | 7.30 |
Patrick von Platen | 5 | 1 | 1.04 |
Anton Lozhkov | 6 | 0 | 0.34 |
Colin Cherry | 7 | 236 | 18.15 |
Ye Jia | 8 | 0 | 1.69 |
Clara Rivera | 9 | 0 | 2.37 |
Mihir Kale | 10 | 0 | 2.03 |
Daan Van Esch | 11 | 0 | 0.34 |
Vera Axelrod | 12 | 0 | 0.34 |
Simran Khanuja | 13 | 0 | 0.68 |
Jonathan H. Clark | 14 | 6 | 1.15 |
Orhan Firat | 15 | 26 | 2.54 |
Sebastian Ruder | 16 | 424 | 28.13 |
Jason Riesa | 17 | 432 | 14.44 |
Melvin Johnson | 18 | 5 | 3.10 |