Abstract | ||
---|---|---|
Financial statements report crucial information in tables with complex semantic structure, which are desirable, yet challenging, to interpret automatically. For example, in such tables a row of data cells is often explained by the headers of other rows. In a departure from prior art, we propose a rectangle mining framework for understanding complex tables, which considers rectangular regions rather than individual cells or pairs of cells in a table. We instantiate this framework with ReMine, an algorithm for extracting row header semantics of table, and show that it significantly outperforms prior pair-wise classification approaches on two datasets: (i) a set of manually labeled financial tables from multiple companies, and (ii) the ICDAR 2013 Table Competition dataset. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1109/ICDAR.2017.52 | 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) |
Keywords | Field | DocType |
complex tables,row header semantics,rectangle mining method,complex semantic structure,data cells,financial tables,ICDAR 2013 Table Competition dataset,financial statements,ReMine algorithm | Row,Computer science,Rectangle,Feature extraction,Prediction algorithms,Header,Finance,Semantics | Conference |
Volume | ISSN | ISBN |
01 | 1520-5363 | 978-1-5386-3587-2 |
Citations | PageRank | References |
0 | 0.34 | 15 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Xilun Chen | 1 | 38 | 7.71 |
Laura Chiticariu | 2 | 10 | 1.51 |
Marina Danilevsky | 3 | 0 | 0.68 |
Alexandre V. Evfimievski | 4 | 501 | 41.76 |
Prithviraj Sen | 5 | 837 | 38.24 |