Title
Structure detection and segmentation of documents using 2D stochastic context-free grammars.
Abstract
In this paper we define a bidimensional extension of stochastic context-free grammars for structure detection and segmentation of images of documents. Two sets of text classification features are used to perform an initial classification of each zone of the page. Then, the document segmentation is obtained as the most likely hypothesis according to a stochastic grammar. We used a dataset of historical marriage license books to validate this approach. We also tested several inference algorithms for probabilistic graphical models and the results showed that the proposed grammatical model outperformed the other methods. Furthermore, grammars also provide the document structure along with its segmentation.
Year
DOI
Venue
2015
10.1016/j.neucom.2014.08.076
Neurocomputing
Keywords
Field
DocType
Document image analysis,Stochastic context-free grammars,Text classification features
Rule-based machine translation,Scale-space segmentation,Context-free grammar,Computer science,Document Structure Description,Artificial intelligence,Natural language processing,L-attributed grammar,Pattern recognition,Segmentation,Graphical model,Stochastic grammar,Machine learning
Journal
Volume
Issue
ISSN
150
PA
0925-2312
Citations 
PageRank 
References 
1
0.36
23
Authors
5