Abstract | ||
---|---|---|
Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-trained neural network model to artificially synthesize new labeled data for supervised learning. We mainly focus on cases with scarce labeled data. Our method, referred to as language-model-based data augmentation (LAM-BADA), involves tine-tuning a state-of-the-art language generator to a specific task through an initial training phase on the existing (usually small) labeled data. Using the fine-tuned model and given a class label, new sentences for the class are generated. Our process then filters these new sentences by using a classifier trained on the original data. In a series of experiments, we show that LAMBADA improves classifiers' performance on a variety of datasets. Moreover, LAMBADA significantly improves upon the state-of-the-art techniques for data augmentation, specifically those applicable to text classification tasks with little data. |
Year | Venue | DocType |
---|---|---|
2020 | national conference on artificial intelligence | Conference |
Volume | ISSN | Citations |
34 | 2159-5399 | 0 |
PageRank | References | Authors |
0.34 | 0 | 8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Anaby-Tavor Ateret | 1 | 0 | 0.34 |
Boaz Carmeli | 2 | 41 | 6.70 |
Goldbraich Esther | 3 | 0 | 0.34 |
Amir Kantor | 4 | 24 | 3.17 |
Kour George | 5 | 0 | 0.34 |
Shlomov Segev | 6 | 0 | 0.34 |
Naama Tepper | 7 | 2 | 1.43 |
Zwerdling Naama | 8 | 0 | 0.34 |