TOC Structure Extraction from OCR-ed Books. - Citegraph

Paper Info

Title
TOC Structure Extraction from OCR-ed Books.

Abstract
This paper addresses the task of extracting the table of contents (TOC) from OCR-ed books. Since the OCR process misses a lot of layout and structural information, it is incapable of enabling navigation experience. A TOC is needed to provide a convenient and quick way to locate the content of interest. In this paper, we propose a hybrid method to extract TOC, which is composed of rule-based method and SVMbased method. The rule-based method mainly focuses on discovering the TOC from the books with TOC pages while the SVM-based method is employed to handle with the books without TOC pages. Experimental results indicate that the proposed methods obtain comparable performance against the other participants of the ICDAR 2011 Book structure extraction competition. © Springer-Verlag 2012.

Year	DOI	Venue
2011	10.1007/978-3-642-35734-3_8	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Keywords	Field	DocType
book structure extraction,table of contents,xml extraction	Information retrieval,Structure extraction,Support vector machine,Table of contents,Artificial intelligence,Engineering	Conference
Volume	Issue	ISSN
7424 LNCS	null	16113349
Citations	PageRank	References
4	0.47	6
Authors
5

Authors (5 rows)

Cited by (4 rows)

References (6 rows)

Name	Order	Citations	PageRank
Caihua Liu	1	11	2.64
Jiajun Chen	2	244	45.03
Xiaofeng Zhang	3	37	9.53
Jie Liu	4	1419	116.47
Yalou Huang	5	744	53.86

1