Abstract | ||
---|---|---|
This paper addresses the task of extracting the table of contents (TOC) from OCR-ed books. Since the OCR process misses a lot of layout and structural information, it is incapable of enabling navigation experience. A TOC is needed to provide a convenient and quick way to locate the content of interest. In this paper, we propose a hybrid method to extract TOC, which is composed of rule-based method and SVMbased method. The rule-based method mainly focuses on discovering the TOC from the books with TOC pages while the SVM-based method is employed to handle with the books without TOC pages. Experimental results indicate that the proposed methods obtain comparable performance against the other participants of the ICDAR 2011 Book structure extraction competition. © Springer-Verlag 2012. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1007/978-3-642-35734-3_8 | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Keywords | Field | DocType |
book structure extraction,table of contents,xml extraction | Information retrieval,Structure extraction,Support vector machine,Table of contents,Artificial intelligence,Engineering | Conference |
Volume | Issue | ISSN |
7424 LNCS | null | 16113349 |
Citations | PageRank | References |
4 | 0.47 | 6 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Caihua Liu | 1 | 11 | 2.64 |
Jiajun Chen | 2 | 244 | 45.03 |
Xiaofeng Zhang | 3 | 37 | 9.53 |
Jie Liu | 4 | 1419 | 116.47 |
Yalou Huang | 5 | 744 | 53.86 |