Abstract | ||
---|---|---|
Although the Arab world has an estimated number of 250 million Arabic speakers, there has been little research on Arabic speech
recognition when compared to other languages of similar importance (e.g. Mandarin). Due to the lack of diacritic Arabic text
and the lack of Pronunciation Dictionary (PD), most of previous work on Arabic Automatic Speech Recognition has been concentrated
on developing recognizers using Romanized characters i.e. let the system recognizes the Arabic word as an English one, then
map it to Arabic word from lookup table that maps the Arabic word to its Romanized pronunciation.
In this work, we introduce the first SPHINX-IV-based Arabic recognizer and propose an automatic toolkit, which is capable
of producing (PD) for both Holly Qura’an and standard Arabic language. Three corpuses are completely developed in this work,
namely the Holly Qura’an Corpus HQC-1 about 18.5 hours, the command and control corpus CAC-1 about 1.5 hours and Arabic digits
corpus ADC less than one hour of speech. The building process is completely described. Fully diacritic Arabic transcriptions,
for all the three corpuses were developed too.
SPHINX-IV engine was customized and trained, for both the language model and the lexicon modules shown in the frame work architecture
block diagram on next page.
Using the three mentioned corpuses; the (PD) developed by our automatic tool with the transcripts, SPHINX-IV engine is trained
and tuned in order to develop three acoustic models, one for each corpus. Training is based on an HMM model that is built
on statistical information and random variables distributions extracted from the training data itself. New algorithm is proposed
to add unlabeled data to the training corpus in order to increase the corpus size. This algorithm is based on Neural Network
confidence scorer and then is used to annotate the decoded speech in order to decide whether the proposed transcript is accepted
and can be added to the seed corpus or not.
The model parameters were fine-tuned using simulated annealing algorithm; optimum values were tested and reported. Our major
contribution is mainly using the open source SPHINX-IV model in Arabic speech recognition by building our own language and
acoustic models without Romanization for the Arabic speech. The system is fine-tuned and data are refined for training and
validation. Optimum values for number of Gaussian mixtures distributions and number of states in HMM’s have been found according
to specified performance measures. Optimum values for confidence scores were found for the training data. Although much more
work need to be done to complete the work with this size, we consider the corpus used in our system is enough to validate
our approach. SPHINX has never been used before in this manner for Arabic speech recognition. The work is an invitation for
all open source speech recognition developers and groups to take over and capitalize on what we have started. |
Year | DOI | Venue |
---|---|---|
2006 | 10.1007/s10772-008-9009-1 | International Journal of Speech Technology |
Keywords | DocType | Volume |
lookup table,random variable,speech recognition,language model,simulated annealing algorithm,arabic language,neural network,automatic speech recognition,command and control | Journal | 9 |
Issue | ISSN | Citations |
3-4 | 1572-8110 | 12 |
PageRank | References | Authors |
1.07 | 14 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hussein Hyassat | 1 | 12 | 1.07 |
Raed Abu Zitar | 2 | 87 | 10.95 |