Title
Unsupervised Method To Generate Page Templates
Abstract
In this paper, we propose a method for automatically inferring the different page templates used to layout the document content. The first step of the method consists in performing a logical analysis of the document. Depending of the coverage of this step, a given number of document elements will be labeled. Then geometric relations are computed between these labeled elements, and page templates candidates are generated using frequent related elements. A fuzzy matching operation allows for selecting the most frequent and relevant page templates for a given document. Such page templates can be used to correct errors produced during the different previous steps of the document analysis: zoning, OCR, and logical analysis. Evaluation has been performed using the INEX book track collection.
Year
DOI
Venue
2011
10.1117/12.873160
DOCUMENT RECOGNITION AND RETRIEVAL XVIII
Keywords
Field
DocType
document layout analysis, geometrical analysis, logical analysis, page template, typography, unsupervised learning
Data mining,Document analysis,Computer science,Unsupervised learning,Artificial intelligence,Information retrieval,Pattern recognition,Geometric analysis,Document layout analysis,Optical character recognition,Error detection and correction,Approximate string matching,Template
Conference
Volume
ISSN
Citations 
7874
0277-786X
0
PageRank 
References 
Authors
0.34
4
1
Name
Order
Citations
PageRank
Hervé Déjean137748.52