Title
Morphological Decomposition for Arabic Broadcast News Transcription
Abstract
In this paper, we present a novel approach for morphological decomposition in large vocabulary Arabic speech recognition. It achieved low out-of-vocabulary (OOV) rate as well as high recognition accuracy in a state-of-the-art Arabic broadcast news transcription system. In this approach, the compound words are decomposed into stems and affixes in both language training and acoustic training data. The decomposed words in the recognition output are re-joined before scoring. Four algorithms are experimented and compared in this work. The best system achieved 1.9% absolute reduction (9.8% relative) in word error rate (WER) when compared to the 64K-word baseline. The recognition performance of this system is also comparable to a 300K-word recognition system trained on the normal words. In the meantime, the decomposed system is much faster in terms of speed and also needs less memory than the systems with larger than 64K vocabularies
Year
DOI
Venue
2006
10.1109/ICASSP.2006.1660214
ICASSP (1)
Keywords
Field
DocType
speech recognition,natural languages,word error rate,morphological decomposition,language training,large vocabulary arabic speech recognition,acoustic training data,word recognition system,arabic broadcast news transcription,out-of-vocabulary,acoustics,data mining,frequency,automatic speech recognition,word recognition,broadcasting,training data,dictionaries
Training set,Broadcasting,Arabic,Recognition system,Computer science,Word error rate,Compound,Speech recognition,Natural language,Artificial intelligence,Natural language processing,Vocabulary
Conference
Volume
ISSN
ISBN
1
1520-6149
1-4244-0469-X
Citations 
PageRank 
References 
24
1.32
8
Authors
5
Name
Order
Citations
PageRank
Bing Xiang1140978.56
Kham Nguyen2473.64
Long Nguyen332684.60
Richard M. Schwartz42839717.76
J. Makhoul51097233.37