Shape Codebook Based Handwritten And Machine Printed Text Zone Extraction - Citegraph

Paper Info

Title
Shape Codebook Based Handwritten And Machine Printed Text Zone Extraction

Abstract
In this paper, we present a novel method for extracting handwritten and printed text zones from noisy document images with mixed content. We use Triple-Adjacent-Segment (TAS) based features which encode local shape characteristics of text in a consistent manner. We first construct two codebooks of the shape features extracted from a set of handwritten and printed text documents respectively. We then compute the normalized histogram of codewords for each segmented zone and use it to train a Support Vector Machine (SVM) classifier. The codebook based approach is robust to the background noise present in the image and TAS features are invariant to translation, scale and rotation of text. In experiments, we show that a pixel-weighted zone classification accuracy of 98% can be achieved for noisy Arabic documents. Further, we demonstrate the effectiveness of our method for document page classification and show that a high precision can be achieved for the detection of machine printed documents. The proposed method is robust to the size of zones, which may contain text content at line or paragraph level.

Year	DOI	Venue
2011	10.1117/12.876725	DOCUMENT RECOGNITION AND RETRIEVAL XVIII
Keywords	Field	DocType
zone classification, zone segmentation, page classification, noisy documents, handwriting, Arabic	Computer vision,Histogram,Background noise,Normalization (statistics),Pattern recognition,Computer science,Support vector machine,Feature extraction,Pixel,Artificial intelligence,Classifier (linguistics),Codebook	Conference
Volume	ISSN	Citations
7874	0277-786X	12
PageRank	References	Authors
0.84	20	6

Authors (6 rows)

Cited by (12 rows)

References (20 rows)

Name	Order	Citations	PageRank
Jayant Kumar	1	173	11.11
Rohit Prasad	2	465	39.06
Huaigu Cao	3	347	29.09
Wael Abd-Almageed	4	248	24.52
David Doermann	5	4313	312.70
Premkumar Natarajan	6	874	79.46

1