Title
DMOS, a generic document recognition method: application to table structure analysis in a general and in a specific way
Abstract
We will show in this paper one of the numerous interests of designing a generic recognition system, i.e. the possibility of producing either general or specific systems. We propose the Description and Modification of Segmentation (DMOS) method, which is made of a new grammatical language (Enhanced Position Formalism—EPF) and an associated parser able to deal with noise. From an EPF description of a kind of document structure, a new recognition system is produced by compilation. This method has been successfully used to produce recognition systems on musical scores, mathematical formulae and even tennis courts in videos. This DMOS generic method separates knowledge from program. Therefore, for a same kind of document like table structures, it is possible to define with EPF, more or less specific descriptions to produce more or less specific recognition systems. For example, we have been able to produce a general recognition system of table structures. It can recognize the hierarchical organization of a table made with rulings, whatever the number/size of column/rows and the deep of the hierarchy contents in it, as soon as the document has a not too bad quality (no missing rulings for example). We will present the way the description is done using EPF to be general enough to recognize very different table organizations. With the same DMOS generic method, we have also been able to easily define a specific recognition system of the table structure of quite damaged military forms of the 19th century. This specific description was necessary to compensate some missing informations concerning the table structure of those military forms, due to a very bad quality or hidden part of the table. This system has been successfully validated on 88,745 images, showing that this DMOS generic method can be used at an industrial level.
Year
DOI
Venue
2006
10.1007/s10032-005-0148-5
IJDAR
Keywords
Field
DocType
document structure,structure analysis
Row,Information structure,Pattern recognition,Computer science,Segmentation,Document Structure Description,Artificial intelligence,Parsing,Hierarchy,Machine learning,Complete information,Hierarchical organization
Journal
Volume
Issue
ISSN
8
2-3
1433-2825
Citations 
PageRank 
References 
35
1.56
27
Authors
1
Name
Order
Citations
PageRank
Bertrand Coüasnon116919.22