Title
Text extraction from degraded document images
Abstract
In this work, a robust segmentation method for text extraction from the historical document images is presented. The method is based on Markovian-Bayesian clustering on local graphs on both pixel and regional scales. It consists of three steps. In the first step, an over-segmented map of the input image is created. The resulting map provides a rich and accurate semi-mosaic fragments. The map is processed in the second step, similar and adjoining sub-regions are merged together to form accurate text shapes. The output of the second step, which contains accurate shapes, is processed in the final step in which, using clustering with fixed number of classes, the segmentation will be obtained. The method employs significantly the local and spatial correlation and coherence on both the image and between the stroke parts, and therefore is very robust with respect to the degradation. The resulting segmented text is smooth, and weak connections and loops are preserved thanks to robust nature of the method. The output can be used in succeeding skeletonization processes which require preservation of the text topology for achieving high performance. The method is tested on real degraded document images with promising results.
Year
DOI
Venue
2010
10.1109/EUVIP.2010.5699135
Visual Information Processing
Keywords
Field
DocType
Bayes methods,Markov processes,document image processing,feature extraction,image segmentation,image thinning,text analysis,Markovian-Bayesian clustering,degraded document image,historical document image,robust segmentation method,skeletonization process,text extraction,text topology,Document image,Graph-partitioning,Image binarization,Image segmentation,MRF
Histogram,Computer vision,Pattern recognition,Segmentation,Computer science,Feature extraction,Image segmentation,Robustness (computer science),Skeletonization,Artificial intelligence,Pixel,Cluster analysis
Conference
ISBN
Citations 
PageRank 
978-1-4244-7288-8
2
0.36
References 
Authors
8
3
Name
Order
Citations
PageRank
Rachid Hedjam19610.35
Reza Farrahi Moghaddam246934.39
Mohamed Cheriet32047238.58