Abstract | ||
---|---|---|
Automatic recognition of offline Arabic text still faces a big challenge due to the Arabic script nature. Recently, researcher's attention has been increased and variant methods had been applied in this area. This paper presents a comparative study of four OCR (Optical Character Recognition) post-processing error correction techniques. We evaluate their impact using two recognition approaches: a lexicon driven approach with and without the presence of OOV (Out Of Vocabulary) words and a lexicon free-based approach. An AOCR (Arabic Optical Character Recognition) is developed for this purpose. This system is based on HMM (Hidden Markov Model) segmentation free approach. A sliding window is performed on the line image from right to left in order to extract the oriented gradient histogram (HOG) features. Experiments are carried out on KAFD database using different scenarios and revealed a significant improvement in OCR error correction rate. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1007/978-3-319-52941-7_27 | PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS 2016) |
Keywords | Field | DocType |
AOCR,HMM,Lexicon,Language model,Post-processing,Sequence alignment | Histogram,Sliding window protocol,Computer science,Optical character recognition,Speech recognition,Lexicon,Natural language processing,Artificial intelligence,Hidden Markov model,Right-to-left,Language model,Arabic script | Conference |
Volume | ISSN | Citations |
552 | 2194-5357 | 0 |
PageRank | References | Authors |
0.34 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Sana Khamekhem Jemni | 1 | 1 | 1.72 |
Yousri Kessentini | 2 | 0 | 0.34 |
Slim Kanoun | 3 | 209 | 20.14 |