Title
Arabic speech recognition using SPHINX engine
Abstract
Although the Arab world has an estimated number of 250 million Arabic speakers, there has been little research on Arabic speech recognition when compared to other languages of similar importance (e.g. Mandarin). Due to the lack of diacritic Arabic text and the lack of Pronunciation Dictionary (PD), most of previous work on Arabic Automatic Speech Recognition has been concentrated on developing recognizers using Romanized characters i.e. let the system recognizes the Arabic word as an English one, then map it to Arabic word from lookup table that maps the Arabic word to its Romanized pronunciation. In this work, we introduce the first SPHINX-IV-based Arabic recognizer and propose an automatic toolkit, which is capable of producing (PD) for both Holly Qura’an and standard Arabic language. Three corpuses are completely developed in this work, namely the Holly Qura’an Corpus HQC-1 about 18.5 hours, the command and control corpus CAC-1 about 1.5 hours and Arabic digits corpus ADC less than one hour of speech. The building process is completely described. Fully diacritic Arabic transcriptions, for all the three corpuses were developed too. SPHINX-IV engine was customized and trained, for both the language model and the lexicon modules shown in the frame work architecture block diagram on next page. Using the three mentioned corpuses; the (PD) developed by our automatic tool with the transcripts, SPHINX-IV engine is trained and tuned in order to develop three acoustic models, one for each corpus. Training is based on an HMM model that is built on statistical information and random variables distributions extracted from the training data itself. New algorithm is proposed to add unlabeled data to the training corpus in order to increase the corpus size. This algorithm is based on Neural Network confidence scorer and then is used to annotate the decoded speech in order to decide whether the proposed transcript is accepted and can be added to the seed corpus or not. The model parameters were fine-tuned using simulated annealing algorithm; optimum values were tested and reported. Our major contribution is mainly using the open source SPHINX-IV model in Arabic speech recognition by building our own language and acoustic models without Romanization for the Arabic speech. The system is fine-tuned and data are refined for training and validation. Optimum values for number of Gaussian mixtures distributions and number of states in HMM’s have been found according to specified performance measures. Optimum values for confidence scores were found for the training data. Although much more work need to be done to complete the work with this size, we consider the corpus used in our system is enough to validate our approach. SPHINX has never been used before in this manner for Arabic speech recognition. The work is an invitation for all open source speech recognition developers and groups to take over and capitalize on what we have started.
Year
DOI
Venue
2006
10.1007/s10772-008-9009-1
International Journal of Speech Technology
Keywords
DocType
Volume
lookup table,random variable,speech recognition,language model,simulated annealing algorithm,arabic language,neural network,automatic speech recognition,command and control
Journal
9
Issue
ISSN
Citations 
3-4
1572-8110
12
PageRank 
References 
Authors
1.07
14
2
Name
Order
Citations
PageRank
Hussein Hyassat1121.07
Raed Abu Zitar28710.95