Title
Arabic spelling error detection and correction
Abstract
A spelling error detection and correction application is typically based on three main components: a dictionary (or reference word list), an error model and a language model. While most of the attention in the literature has been directed to the language model, we show how improvements in any of the three components can lead to significant cumulative improvements in the overall performance of the system. We develop our dictionary of 9.2 million fully-inflected Arabic words (types) from a morphological transducer and a large corpus, validated and manually revised. We improve the error model by analyzing error types and creating an edit distance re-ranker. We also improve the language model by analyzing the level of noise in different data sources and selecting an optimal subset to train the system on. Testing and evaluation experiments show that our system significantly outperforms Microsoft Word 2013, OpenOffice Ayaspell 3.4 and Google Docs.
Year
DOI
Venue
2016
10.1017/S1351324915000030
NATURAL LANGUAGE ENGINEERING
Field
DocType
Volume
Edit distance,Arabic,Computer science,Speech recognition,Error detection and correction,Artificial intelligence,Natural language processing,Spelling,Word processing,Language model
Journal
22
Issue
ISSN
Citations 
5.0
1351-3249
3
PageRank 
References 
Authors
0.40
10
5
Name
Order
Citations
PageRank
Mohammed Attia114616.51
Pavel Pecina255852.31
Samih Younes33811.26
Khaled F. Shaalan450639.80
Josef van Genabith51037105.64