Title
Handwritten and Machine Printed Text Separation in Document Images Using the Bag of Visual Words Paradigm
Abstract
In a number of types of documents, ranging from forms to archive documents and books with annotations, machine printed and handwritten text may be present in the same document image, giving rise to significant issues within a digitisation and recognition pipeline. It is therefore necessary to separate the two types of text before applying different recognition methodologies to each. In this paper, a new approach is proposed which strives towards identifying and separating handwritten from machine printed text using the Bag of Visual Words paradigm (BoVW). Initially, blocks of interest are detected in the document image. For each block, a descriptor is calculated based on the BoVW. The final characterization of the blocks as Handwritten, Machine Printed or Noise is made by a Support Vector Machine classifier. The promising performance of the proposed approach is shown by using a consistent evaluation methodology which couples meaningful measures along with a new dataset.
Year
DOI
Venue
2012
10.1109/ICFHR.2012.207
ICFHR
Keywords
Field
DocType
machine printed,handwritten text,new dataset,visual words paradigm,support vector machine classifier,different recognition methodology,machine printed text separation,recognition pipeline,new approach,document image,text analysis,image classification,support vector machines
Scale-invariant feature transform,Pattern recognition,Bag-of-words model in computer vision,Document image processing,Support vector machine classifier,Computer science,Support vector machine,Document processing,Artificial intelligence,Contextual image classification,Intelligent word recognition
Conference
ISSN
Citations 
PageRank 
2167-6445
9
0.57
References 
Authors
22
5
Name
Order
Citations
PageRank
Konstantinos Zagoris123117.12
Ioannis Pratikakis2106557.91
Apostolos Antonacopoulos337836.45
Basilis Gatos477343.34
Nikos Papamarkos554637.16