Title
Boosting Based Text And Non-Text Region Classification
Abstract
Layout analysis is a crucial process for document image understanding and information retrieval. Document layout analysis depends on page segmentation and block classification. This paper describes an algorithm for extracting blocks from document images and a boosting based method to classify those blocks as machine printed text or not. The feature vector which is fed into the boosting classifier consists of a four direction run-length histogram, and connected components features in both background and foreground. Using a combination of features through a boosting classifier, we obtain an accuracy of 99.5% on our test collection.
Year
DOI
Venue
2011
10.1117/12.876736
DOCUMENT RECOGNITION AND RETRIEVAL XVIII
Keywords
Field
DocType
Document image analysis, layout analysis, zone classification, adaptive boosting
Histogram,Computer vision,Feature vector,Pattern recognition,Computer science,Document clustering,Segmentation,Document layout analysis,Boosting (machine learning),Artificial intelligence,Connected component,Classifier (linguistics)
Conference
Volume
ISSN
Citations 
7874
0277-786X
0
PageRank 
References 
Authors
0.34
3
2
Name
Order
Citations
PageRank
Binqing Xie100.34
Gady Agam239143.99