Extracting structured data from unstructured document with incomplete resources - Citegraph

Paper Info

Title
Extracting structured data from unstructured document with incomplete resources

Abstract
We present a method for extracting structured elements of information, called structured data (sdata), from ocr'ed pages. The method first analyzes the layout of the page, building several concurrent layout structures. Then a tagging step is performed in order to tag textual elements based on their content. Combining the layout structures and the tagged elements, layout models for representing the structured data are inferred for the current page. These models are used to correct or tag some elements missed by the tagging step. The final set of structured data is extracted. An evaluation is presented.

Year	DOI	Venue
2015	10.1109/ICDAR.2015.7333766	ICDAR '15 Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR)
Keywords	Field	DocType
document layout analysis	Information retrieval,Computer science,Document layout analysis,Data extraction,Data model	Conference
ISSN	Citations	PageRank
1520-5363	5	0.55
References	Authors
3	1

Authors (1 rows)

Cited by (5 rows)

References (3 rows)

Name	Order	Citations	PageRank
Hervé Déjean	1	377	48.52

1