Abstract | ||
---|---|---|
We present TableBank, a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet. Existing research for image-based table detection and recognition usually fine-tunes pre-trained models on out-of-domain data with a few thousands human labeled examples, which is difficult to generalize on real world applications. With TableBank that contains 417K high-quality labeled tables, we build several strong baselines using state-of-the-art models with deep neural networks. We make TableBank publicly available (https://github.com/doc-analysis/TableBank) and hope it will empower more deep learning approaches in the table detection and recognition task. |
Year | Venue | DocType |
---|---|---|
2020 | LREC | Conference |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Minghao Li | 1 | 1 | 2.12 |
Lizhen Cui | 2 | 154 | 38.68 |
Shaohan Huang | 3 | 57 | 10.29 |
Furu Wei | 4 | 1956 | 107.57 |
Ming Zhou | 5 | 4262 | 251.74 |
Zhoujun Li | 6 | 964 | 115.99 |