Abstract | ||
---|---|---|
Imbalanced data is a perennial problem that impedes the learning abilities of current machine learning-based classification models. One approach to address it is to leverage data augmentation to expand the training set. For image data, there are a number of suitable augmentation techniques that have proven effective in previous work. For textual data, however, due to the discrete units inherent in natural language, techniques that randomly perturb the signal may be ineffective. Additionally, due to the substantial discrepancy between different textual datasets (e.g., different domains), an augmentation approach that facilitates the classification on one dataset may be detrimental on another dataset. For practitioners, comparing different data augmentation techniques is non-trivial, as the corresponding methods might need to be incorporated into different system architectures, and the implementation of some approaches, such as generative models, is laborious. To address these challenges, we develop EasyAug, a data augmentation platform that provides several augmentation approaches. Users can conveniently compare the classification results and can easily choose the most suitable one for their own dataset. In addition, the system is extensible and can incorporate further augmentation approaches, such that with minimal effort a new method can comprehensively be compared with the baselines.
|
Year | DOI | Venue |
---|---|---|
2020 | 10.1145/3366424.3383552 | WWW '20: The Web Conference 2020
Taipei
Taiwan
April, 2020 |
DocType | ISBN | Citations |
Conference | 978-1-4503-7024-0 | 1 |
PageRank | References | Authors |
0.36 | 0 | 8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Siyuan Qiu | 1 | 1 | 0.70 |
Binxia Xu | 2 | 1 | 0.36 |
Jie Zhang | 3 | 1995 | 156.26 |
Yafang Wang | 4 | 134 | 13.56 |
Xiaoyu Shen | 5 | 1 | 0.70 |
Gerard de Melo | 6 | 723 | 53.54 |
Chong Long | 7 | 94 | 6.82 |
Xiaolong Li | 8 | 362 | 36.92 |