Abstract | ||
---|---|---|
The field of optical character recognition for the Arabic text is not getting much attention by researchers comparing to Latin text. It is only in the last two decades that this field was being exploited, due to the complexity of Arabic writing and the fact that it demands a critical step which is segmentation; first from text to lines, then from lines to words and finally from words to characters. In case of historical documents, the segmentation is more complicated because of the absence of writing rules and the poor quality of documents. In this paper we present a projection-based technique for the segmentation of text into lines of ancient Arabic documents. To override the problem of overlapping and touching lines which is the most challenging problem facing the segmentation systems, firstly, pre-processing operations are applied for binarization and noise reduction. Secondly a skew correction technique is proposed beside a space following algorithm which is performed to separate lines from each other. The segmentation method is applied on four representations of the text image, including an original binary image and other three representations obtained by transforming the input image into: (1) smeared image with RLSA algorithm, (2) up-to-down transitions, (3) smoothed image by gaussian filter. The obtained results are promising and they are compared in term of accuracy and time cost. These methods are evaluated on a private set of 129 historical documents images provided by Al-Qaraouiyine Library. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1007/978-3-030-32959-4_13 | Communications in Computer and Information Science |
Keywords | DocType | Volume |
Arabic text line segmentation,Historical documents,Projection,Skew correction | Conference | 1108 |
ISSN | Citations | PageRank |
1865-0929 | 0 | 0.34 |
References | Authors | |
0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Abdelhay Zoizou | 1 | 0 | 0.34 |
Arsalane Zarghili | 2 | 7 | 5.87 |
Ilham Chaker | 3 | 0 | 0.34 |