Title
A hybrid model for spelling error detection and correction for Urdu language
Abstract
Detecting and correcting misspelled words in a written text are of great importance in many natural language processing applications. Errors can be broadly classified into two groups, namely spelling error and contextual errors. Spelling errors occur when the misspelled words do not exist in a dictionary and are meaningless, while contextual errors occur when the words do exist in the dictionary, but their use is not as intended by the writer. This paper presents an "Urdu Spell Checker" that detects incorrect spellings of a word using widely used lexicon lookup approach and provides a list of candidate words containing correct spellings by applying the edit distance technique which covers all types of spelling errors. To identify the best candidate word, this paper proposes a hybrid model that ranks the words in the candidate word list. Multiple ranking techniques such as Soundex, Shapex, LCS and N-gram are used standalone, as well in combination, to determine the best technique in terms of F1 score. A dictionary containing 48,551 words is developed from UMC corpus and Urdu newspaper corpus. Our hybrid model achieves an F1 score of 94.02% when considering top five suggested words and an F1 score of 88.29% when considering top one suggested word.
Year
DOI
Venue
2021
10.1007/s00521-021-06110-7
NEURAL COMPUTING & APPLICATIONS
Keywords
DocType
Volume
Spelling errors, Candidate words, Error detection, Error correction, Spell checker
Journal
33
Issue
ISSN
Citations 
21
0941-0643
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Romila Aziz100.34
Muhammad Waqas Anwar200.68
Muhammad Hasan Jamal300.34
Usama Ijaz Bajwa4135.04