Abstract | ||
---|---|---|
Text classification is a widely studied problem and has broad applications. In many real-world problems, the number of texts for training classification models is limited, which renders these models prone to overfitting. To address this problem, we propose SSL-Reg, a data-dependent regularization approach based on self-supervised learning (SSL). SSL (Devlin et al., 2019a) is an unsupervised learning approach that defines auxiliary tasks on input data without using any human-provided labels and learns data representations by solving these auxiliary tasks. In SSL-Reg, a supervised classification task and an unsupervised SSL task are performed simultaneously. The SSL task is unsupervised, which is defined purely on input texts without using any humanprovided labels. Training a model using an SSL task can prevent the model from being overfitted to a limited number of class labels in the classification task. Experiments on 17 text classification datasets demonstrate the effectiveness of our proposed method. Code is available at https://github.com /UCSD-AI4H/SSReg. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1162/tacl_a_00389 | TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS |
DocType | Volume | Citations |
Journal | 9 | 0 |
PageRank | References | Authors |
0.34 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Meng Zhou | 1 | 0 | 0.68 |
Zechen Li | 2 | 0 | 0.34 |
Pengtao Xie | 3 | 339 | 22.63 |