Title
A Realistic Dataset for Performance Evaluation of Document Layout Analysis
Abstract
There is a significant need for a realistic dataset on which to evaluate layout analysis methods and examine their performance in detail. This paper presents a new dataset (and the methodology used to create it) based on a wide range of contemporary documents. Strong emphasis is placed on comprehensive and detailed representation of both complex and simple layouts, and on colour originals. In-depth information is recorded both at the page and region level. Ground truth is efficiently created using a new semi-automated tool and stored in a new comprehensive XML representation, the PAGE format. The dataset can be browsed and searched via a Web-based front end to the underlying database and suitable subsets (relevant to specific evaluation goals) can be selected and downloaded.
Year
DOI
Venue
2009
10.1109/ICDAR.2009.271
ICDAR-1
Keywords
Field
DocType
comprehensive xml representation,layout analysis,new comprehensive xml representation,contemporary document,region classification,xml,page format,online front-ends,ground truth,detailed representation,realistic dataset,pge segmentation,software performance evaluation,new dataset,datasets,in-depth information,ground truth format,document layout analysis,new semi-automated tool,performance evaluation,colour original,document handling,web-based front end,contemporary documents,pattern recognition,text analysis,pattern analysis,image recognition,front end,layout,databases,quality control,image analysis,biomedical imaging,data engineering
Front and back ends,Data mining,Information retrieval,XML,Computer science,Pattern analysis,Document layout analysis,Ground truth,Information engineering,Document handling
Conference
ISSN
ISBN
Citations 
1520-5363 E-ISBN : 978-0-7695-3725-2
978-0-7695-3725-2
34
PageRank 
References 
Authors
2.27
4
4
Name
Order
Citations
PageRank
Apostolos Antonacopoulos137836.45
David Bridson21059.12
Christos Papadopoulos3584.06
stefan pletschacher421620.78