Abstract | ||
---|---|---|
Researchers deal with the class imbalanced problem in many real-world applications and GAN based data augmentation is considered as an efficient approach to address this problem. GANs need a huge training data to generate efficient augmented data. However, the required sufficient training data is not available in many research areas. In this paper, we introduce a new concept called virtual big data to address this problem. We prove that, virtual big data can provide the GANs sufficient training data to generate efficient augmented data with less mode collapse and vanishing generator gradients problems. We show that, the curse of dimensionality which is considered as a negative factor in machine learning can play a positive role to solve vanishing generator gradients via making discriminator less perfect. First, we transform the training data from n dimensional space into m dimensional space where, m = c * n and c is concatenation factor. To do so, c different training instances are selected and concatenated to each other to form a c * n dimensional instance. Increasing the dimension of training data from n to c * n is key to increase the number of training instances from N to C(N, c). Transformed training data are called virtual big data since they differ original training instances in terms of size and dimension. Our experiments show that, V-GAN, a GAN trained by virtual big data can outperform standard GANs when it comes to deal with extremely scarce training data. Furthermore, V-GAN can outperform traditional oversampling techniques in terms of precision, F1 score and Area Under Curve (AUC) score. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/BigData47090.2019.9006268 | 2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) |
Keywords | Field | DocType |
GAN, Data Augmentation, Imbalanced Data Classification | Training set,Data mining,Discrete mathematics,F1 score,Discriminator,Oversampling,Computer science,Curse of dimensionality,Concatenation,Big data | Conference |
ISSN | Citations | PageRank |
2639-1589 | 0 | 0.34 |
References | Authors | |
0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hadi Mansourifar | 1 | 0 | 0.34 |
Lin Chen | 2 | 100 | 23.63 |
Weidong Shi | 3 | 145 | 6.45 |