Title
FORTAP: Using Formulas for Numerical-Reasoning-Aware Table Pretraining
Abstract
Tables store rich numerical data, but numerical reasoning over tables is still a challenge. In this paper, we find that the spreadsheet formula, a commonly used language to perform computations on numerical values in spreadsheets, is valuable supervision for numerical reasoning in tables. Considering large amounts of spreadsheets available on the web, we propose FORTAP, the first exploration to leverage spreadsheet formulas for table pretraining. Two novel self-supervised pretraining objectives are derived from formulas, numerical reference prediction (NRP) and numerical calculation prediction (NCP). While our proposed objectives are generic for encoders, to better capture spreadsheet table layouts and structures, we build FORTAP upon TUTA, the first transformer-based method for spreadsheet&web table pretraining with tree attention. FORTAP outperforms state-of-the-art methods by large margins on three representative datasets of formula prediction, question answering, and cell type classification, showing the great potential of leveraging formulas for table pretraining. The code will be released at https://github.com/microsoft/TUTA_table_understanding.
Year
DOI
Venue
2022
10.18653/v1/2022.acl-long.82
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS)
DocType
Volume
Citations 
Conference
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
0
PageRank 
References 
Authors
0.34
0
7
Name
Order
Citations
PageRank
Cheng Zhoujun101.69
Haoyu Dong252.76
Ran Jia311.45
Pengfei Wu400.34
Shi Han529615.22
Fan Cheng600.34
Dongmei Zhang71439132.94