Abstract | ||
---|---|---|
In this paper, we propose a method for automatically inferring the different page templates used to layout the document content. The first step of the method consists in performing a logical analysis of the document. Depending of the coverage of this step, a given number of document elements will be labeled. Then geometric relations are computed between these labeled elements, and page templates candidates are generated using frequent related elements. A fuzzy matching operation allows for selecting the most frequent and relevant page templates for a given document. Such page templates can be used to correct errors produced during the different previous steps of the document analysis: zoning, OCR, and logical analysis. Evaluation has been performed using the INEX book track collection. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1117/12.873160 | DOCUMENT RECOGNITION AND RETRIEVAL XVIII |
Keywords | Field | DocType |
document layout analysis, geometrical analysis, logical analysis, page template, typography, unsupervised learning | Data mining,Document analysis,Computer science,Unsupervised learning,Artificial intelligence,Information retrieval,Pattern recognition,Geometric analysis,Document layout analysis,Optical character recognition,Error detection and correction,Approximate string matching,Template | Conference |
Volume | ISSN | Citations |
7874 | 0277-786X | 0 |
PageRank | References | Authors |
0.34 | 4 | 1 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hervé Déjean | 1 | 377 | 48.52 |