BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus - Citegraph

Paper Info

Title
BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus

Abstract
BibleTTS is a large, high-quality, open speech dataset for ten languages spoken in Sub-Saharan Africa. The corpus contains up to 86 hours of aligned, studio quality 48kHz single speaker recordings per language, enabling the development of high-quality text-to-speech models. The ten languages represented are: Akuapem Twi, Asante Twi, Chichewa, Ewe, Hausa, Kikuyu, Lingala, Luganda, Luo, and Yoruba. This corpus is a derivative work of Bible recordings made and released by the Open.Bible project from Biblica. We have aligned, cleaned, and filtered the original recordings, and additionally hand-checked a subset of the alignments for each language. We present results for text-to-speech models with Coqui TTS. The data is released under a commercial-friendly CC-BY-SA license.

Year	DOI	Venue
2022	10.21437/INTERSPEECH.2022-10850	Conference of the International Speech Communication Association (INTERSPEECH)
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	19

Authors (19 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Josh Meyer	1	0	1.35
David Ifeoluwa Adelani	2	0	1.35
Edresson Casanova	3	1	2.39
Alp Öktem	4	0	0.68
Daniel Whitenack Julian Weber	5	0	0.34
Salomon Kabongo	6	0	0.68
Elizabeth Salesky	7	0	1.35
Iroro Orife	8	0	1.35
Colin Leong	9	0	1.01
Perez Ogayo	10	0	2.03
Chris Emezue	11	0	1.01
Jonathan Mukiibi	12	0	1.01
Salomey Osei	13	0	1.35
Apelete Agbolo	14	0	0.34
Victor Akinode	15	0	0.34
Bernard Opoku	16	0	0.34
Samuel Olanrewaju	17	0	0.34
Jesujoba O. Alabi	18	0	1.69
Shamsuddeen Muhammad	19	0	0.68

1