Title
Asr Corpus Design For Resource-Scarce Languages
Abstract
We investigate the number of speakers and the amount of data that is required for the development of useable speaker-independent speech-recognition systems in resource-scarce languages. Our experiments employ the Lwazi corpus, which contains speech in the eleven official languages of South Africa. We find that a surprisingly small number of speakers (fewer than 50) and around 10 to 20 hours of speech per language are sufficient for the purposes of acceptable phone-based recognition.
Year
Venue
Keywords
2009
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5
speech recognition, corpus design
Field
DocType
Citations 
Speech corpus,Computer science,Speech recognition,Phone,Natural language processing,Artificial intelligence,VoxForge
Conference
24
PageRank 
References 
Authors
1.75
4
3
Name
Order
Citations
PageRank
Etienne Barnard143857.85
Marelie H. Davel223622.70
Charl Johannes van Heerden313312.50