Title
XTREME-S: Evaluating Cross-lingual Speech Representations
Abstract
We introduce XTREME-S, a new benchmark to evaluate universal cross-lingual speech representations in many languages. XTREME-S covers four task families: speech recognition, classification, speech-to-text translation and retrieval. Covering 102 languages from 10+ language families, 3 different domains and 4 task families, XTREME-S aims to simplify multilingual speech representation evaluation, as well as catalyze research in "universal" speech representation learning. This paper describes the new benchmark and establishes the first speech-only and speech-text baselines using XLS-R and mSLAM on all downstream tasks. We motivate the design choices and detail how to use the benchmark. Datasets and fine-tuning scripts are made easily accessible.
Year
DOI
Venue
2022
10.21437/INTERSPEECH.2022-10007
Conference of the International Speech Communication Association (INTERSPEECH)
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
18
Name
Order
Citations
PageRank
Alexis Conneau134215.03
Ankur Bapna2368.45
Yu Zhang344241.79
Min Ma4197.30
Patrick von Platen511.04
Anton Lozhkov600.34
Colin Cherry723618.15
Ye Jia801.69
Clara Rivera902.37
Mihir Kale1002.03
Daan Van Esch1100.34
Vera Axelrod1200.34
Simran Khanuja1300.68
Jonathan H. Clark1461.15
Orhan Firat15262.54
Sebastian Ruder1642428.13
Jason Riesa1743214.44
Melvin Johnson1853.10