Abstract | ||
---|---|---|
Tables store rich numerical data, but numerical reasoning over tables is still a challenge. In this paper, we find that the spreadsheet formula, a commonly used language to perform computations on numerical values in spreadsheets, is valuable supervision for numerical reasoning in tables. Considering large amounts of spreadsheets available on the web, we propose FORTAP, the first exploration to leverage spreadsheet formulas for table pretraining. Two novel self-supervised pretraining objectives are derived from formulas, numerical reference prediction (NRP) and numerical calculation prediction (NCP). While our proposed objectives are generic for encoders, to better capture spreadsheet table layouts and structures, we build FORTAP upon TUTA, the first transformer-based method for spreadsheet&web table pretraining with tree attention. FORTAP outperforms state-of-the-art methods by large margins on three representative datasets of formula prediction, question answering, and cell type classification, showing the great potential of leveraging formulas for table pretraining. The code will be released at https://github.com/microsoft/TUTA_table_understanding. |
Year | DOI | Venue |
---|---|---|
2022 | 10.18653/v1/2022.acl-long.82 | PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS) |
DocType | Volume | Citations |
Conference | Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) | 0 |
PageRank | References | Authors |
0.34 | 0 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Cheng Zhoujun | 1 | 0 | 1.69 |
Haoyu Dong | 2 | 5 | 2.76 |
Ran Jia | 3 | 1 | 1.45 |
Pengfei Wu | 4 | 0 | 0.34 |
Shi Han | 5 | 296 | 15.22 |
Fan Cheng | 6 | 0 | 0.34 |
Dongmei Zhang | 7 | 1439 | 132.94 |