Title
Do Not Have Enough Data? Deep Learning To The Rescue!
Abstract
Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-trained neural network model to artificially synthesize new labeled data for supervised learning. We mainly focus on cases with scarce labeled data. Our method, referred to as language-model-based data augmentation (LAM-BADA), involves tine-tuning a state-of-the-art language generator to a specific task through an initial training phase on the existing (usually small) labeled data. Using the fine-tuned model and given a class label, new sentences for the class are generated. Our process then filters these new sentences by using a classifier trained on the original data. In a series of experiments, we show that LAMBADA improves classifiers' performance on a variety of datasets. Moreover, LAMBADA significantly improves upon the state-of-the-art techniques for data augmentation, specifically those applicable to text classification tasks with little data.
Year
Venue
DocType
2020
national conference on artificial intelligence
Conference
Volume
ISSN
Citations 
34
2159-5399
0
PageRank 
References 
Authors
0.34
0
8
Name
Order
Citations
PageRank
Anaby-Tavor Ateret100.34
Boaz Carmeli2416.70
Goldbraich Esther300.34
Amir Kantor4243.17
Kour George500.34
Shlomov Segev600.34
Naama Tepper721.43
Zwerdling Naama800.34