Title
Skew Correction and Text Line Extraction of Arabic Historical Documents.
Abstract
The field of optical character recognition for the Arabic text is not getting much attention by researchers comparing to Latin text. It is only in the last two decades that this field was being exploited, due to the complexity of Arabic writing and the fact that it demands a critical step which is segmentation; first from text to lines, then from lines to words and finally from words to characters. In case of historical documents, the segmentation is more complicated because of the absence of writing rules and the poor quality of documents. In this paper we present a projection-based technique for the segmentation of text into lines of ancient Arabic documents. To override the problem of overlapping and touching lines which is the most challenging problem facing the segmentation systems, firstly, pre-processing operations are applied for binarization and noise reduction. Secondly a skew correction technique is proposed beside a space following algorithm which is performed to separate lines from each other. The segmentation method is applied on four representations of the text image, including an original binary image and other three representations obtained by transforming the input image into: (1) smeared image with RLSA algorithm, (2) up-to-down transitions, (3) smoothed image by gaussian filter. The obtained results are promising and they are compared in term of accuracy and time cost. These methods are evaluated on a private set of 129 historical documents images provided by Al-Qaraouiyine Library.
Year
DOI
Venue
2019
10.1007/978-3-030-32959-4_13
Communications in Computer and Information Science
Keywords
DocType
Volume
Arabic text line segmentation,Historical documents,Projection,Skew correction
Conference
1108
ISSN
Citations 
PageRank 
1865-0929
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Abdelhay Zoizou100.34
Arsalane Zarghili275.87
Ilham Chaker300.34