Title
TOC Structure Extraction from OCR-ed Books.
Abstract
This paper addresses the task of extracting the table of contents (TOC) from OCR-ed books. Since the OCR process misses a lot of layout and structural information, it is incapable of enabling navigation experience. A TOC is needed to provide a convenient and quick way to locate the content of interest. In this paper, we propose a hybrid method to extract TOC, which is composed of rule-based method and SVMbased method. The rule-based method mainly focuses on discovering the TOC from the books with TOC pages while the SVM-based method is employed to handle with the books without TOC pages. Experimental results indicate that the proposed methods obtain comparable performance against the other participants of the ICDAR 2011 Book structure extraction competition. © Springer-Verlag 2012.
Year
DOI
Venue
2011
10.1007/978-3-642-35734-3_8
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Keywords
Field
DocType
book structure extraction,table of contents,xml extraction
Information retrieval,Structure extraction,Support vector machine,Table of contents,Artificial intelligence,Engineering
Conference
Volume
Issue
ISSN
7424 LNCS
null
16113349
Citations 
PageRank 
References 
4
0.47
6
Authors
5
Name
Order
Citations
PageRank
Caihua Liu1112.64
Jiajun Chen224445.03
Xiaofeng Zhang3379.53
Jie Liu41419116.47
Yalou Huang574453.86