Title
Automatic extraction of text regions from document images by multilevel thresholding and k-means clustering
Abstract
Textual data plays an important role in a number of applications such as image database indexing, document understanding, and image-based web searching. The target of automatic real-life text extracting in document images without character recognition module is to identify image regions that contain only text. These textual regions can then be either input of optical character recognition application or highlighted for user focusing. In this paper we propose a method which consists of three stages-preprocessing which improves contrast of grayscale image, multi-level thresholding for separating textual region from non-textual object such as graphics, pictures, and complex background, and heuristic filter, recursive filter for text localizing in textual region. In many of these applications, it is not necessary to identify all the text regions, therefor we emphasize on identifying important text region with relatively large size and high contrast. Experimental results on real-life dataset images demonstrate that the proposed method is effective in identifying textual region with various illuminations, size and font from various types of background.
Year
DOI
Venue
2015
10.1109/ICIS.2015.7166615
2015 IEEE/ACIS 14th International Conference on Computer and Information Science (ICIS)
Keywords
Field
DocType
Multilevel,K-mean,Connected Component
Graphics,Computer vision,k-means clustering,Pattern recognition,Document clustering,Computer science,Search engine indexing,Optical character recognition,Artificial intelligence,Recursive filter,Thresholding,Grayscale
Conference
Citations 
PageRank 
References 
1
0.34
15
Authors
4
Name
Order
Citations
PageRank
Hoai Nam Vu110.34
Tuan Anh Tran2283.22
In Seop Na34213.83
Soo Hyung Kim4296.39