MLS: A Large-Scale Multilingual Dataset for Speech Research - Citegraph

Paper Info

Title
MLS: A Large-Scale Multilingual Dataset for Speech Research

Abstract
This paper introduces Multilingual LibriSpeech (MLS) dataset, a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages, including about 44.5K hours of English and a total of about 6K hours for other languages. Additionally, we provide Language Models (LM) and baseline Automatic Speech Recognition (ASR) models and for all the languages in our dataset. We believe such a large transcribed dataset will open new avenues in ASR and Text-To-Speech (TTS) research. The dataset will be made freely available for anyone at http://www.openslr.org.

Year	DOI	Venue
2020	10.21437/Interspeech.2020-2826	INTERSPEECH
DocType	ISSN	Citations
Conference	Interspeech 2020	4
PageRank	References	Authors
0.39	0	5

Authors (5 rows)

Cited by (4 rows)

References (0 rows)

Name	Order	Citations	PageRank
Vineel Pratap	1	16	2.69
Qiantong Xu	2	34	7.42
Anuroop Sriram	3	5	0.76
Gabriel Synnaeve	4	27	7.73
Ronan Collobert	5	4002	308.61

1