Title
Impact of a newly developed modern standard Arabic speech corpus on implementing and evaluating automatic continuous speech recognition systems
Abstract
Being current formal linguistic standard and only acceptable form of Arabic language for all native speakers, Modern Standard Arabic (MSA) still lacks sufficient spoken corpora compared to other forms like Dialectal Arabic. This paper describes our work towards developing a new speech corpus for MSA, which can be used for implementing and evaluating any Arabic automatic continuous speech recognition system. The speech corpus contains 415 (367 training and 48 testing) sentences recorded by 42 (21 male and 21 female) Arabic native speakers from 11 countries representing three major regions (Levant, Gulf, and Africa). The impact of using this speech corpus on overall performance of Arabic automatic continuous speech recognition systems was examined. Two development phases were conducted based on the size of training data, Gaussian mixture distributions, and tied states (senones). Overall results indicate that larger training data size result higher word recognition rates and lower Word Error Rates (WER).
Year
DOI
Venue
2010
10.1007/978-3-642-16202-2_1
IWSDS
Keywords
Field
DocType
modern standard arabic speech,arabic native speaker,speech corpus,arabic automatic continuous speech,new speech corpus,recognition system,result higher word recognition,automatic continuous speech recognition,arabic language,dialectal arabic,modern standard arabic,larger training data size,word recognition,speech recognition,native speaker,word error rate
Speech corpus,Training set,Arabic,Computer science,Word recognition,Text corpus,Speech recognition,Modern Standard Arabic,Natural language processing,Artificial intelligence,Continuous speech recognition system,VoxForge
Conference
Volume
ISSN
ISBN
6392
0302-9743
3-642-16201-5
Citations 
PageRank 
References 
2
0.49
0
Authors
5