XTREME-S: Evaluating Cross-lingual Speech Representations - Citegraph

Paper Info

Title
XTREME-S: Evaluating Cross-lingual Speech Representations

Abstract
We introduce XTREME-S, a new benchmark to evaluate universal cross-lingual speech representations in many languages. XTREME-S covers four task families: speech recognition, classification, speech-to-text translation and retrieval. Covering 102 languages from 10+ language families, 3 different domains and 4 task families, XTREME-S aims to simplify multilingual speech representation evaluation, as well as catalyze research in "universal" speech representation learning. This paper describes the new benchmark and establishes the first speech-only and speech-text baselines using XLS-R and mSLAM on all downstream tasks. We motivate the design choices and detail how to use the benchmark. Datasets and fine-tuning scripts are made easily accessible.

Year	DOI	Venue
2022	10.21437/INTERSPEECH.2022-10007	Conference of the International Speech Communication Association (INTERSPEECH)
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	18

Authors (18 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Alexis Conneau	1	342	15.03
Ankur Bapna	2	36	8.45
Yu Zhang	3	442	41.79
Min Ma	4	19	7.30
Patrick von Platen	5	1	1.04
Anton Lozhkov	6	0	0.34
Colin Cherry	7	236	18.15
Ye Jia	8	0	1.69
Clara Rivera	9	0	2.37
Mihir Kale	10	0	2.03
Daan Van Esch	11	0	0.34
Vera Axelrod	12	0	0.34
Simran Khanuja	13	0	0.68
Jonathan H. Clark	14	6	1.15
Orhan Firat	15	26	2.54
Sebastian Ruder	16	424	28.13
Jason Riesa	17	432	14.44
Melvin Johnson	18	5	3.10

1