Title
Automatic diacritization of Arabic text using recurrent neural networks
Abstract
This paper presents a sequence transcription approach for the automatic diacritization of Arabic text. A recurrent neural network is trained to transcribe undiacritized Arabic text with fully diacritized sentences. We use a deep bidirectional long short-term memory network that builds high-level linguistic abstractions of text and exploits long-range context in both input directions. This approach differs from previous approaches in that no lexical, morphological, or syntactical analysis is performed on the data before being processed by the net. Nonetheless, when the network is post-processed with our error correction techniques, it achieves state-of-the-art performance, yielding an average diacritic and word error rates of 2.09 and 5.82 %, respectively, on samples from 11 books. For the LDC ATB3 benchmark, this approach reduces the diacritic error rate by 25 %, the word error rate by 20 %, and the last-letter diacritization error rate by 33 % over the best published results.
Year
DOI
Venue
2015
10.1007/s10032-015-0242-2
International Journal on Document Analysis and Recognition
Keywords
Field
DocType
Automatic diacritization, Arabic text, Machine learning, Sequence transcription, Recurrent neural networks, Deep neural networks, Long short-term memory
Arabic,Computer science,Recurrent neural network,Long short term memory,Diacritic,Natural language processing,Artificial intelligence,Deep neural networks,Pattern recognition,Word error rate,Error detection and correction,Speech recognition,Machine learning
Journal
Volume
Issue
ISSN
18
2
1433-2825
Citations 
PageRank 
References 
12
1.01
27
Authors
6
Name
Order
Citations
PageRank
Gheith A. Abandah1878.53
Graves, Alex28572405.10
Balkees Al-Shagoor3121.01
Alaa Arabiyat4121.01
Fuad T. Jamour5263.19
Majid Al-Taee6277.52