Title
Virtual Big Data For Gan Based Data Augmentation
Abstract
Researchers deal with the class imbalanced problem in many real-world applications and GAN based data augmentation is considered as an efficient approach to address this problem. GANs need a huge training data to generate efficient augmented data. However, the required sufficient training data is not available in many research areas. In this paper, we introduce a new concept called virtual big data to address this problem. We prove that, virtual big data can provide the GANs sufficient training data to generate efficient augmented data with less mode collapse and vanishing generator gradients problems. We show that, the curse of dimensionality which is considered as a negative factor in machine learning can play a positive role to solve vanishing generator gradients via making discriminator less perfect. First, we transform the training data from n dimensional space into m dimensional space where, m = c * n and c is concatenation factor. To do so, c different training instances are selected and concatenated to each other to form a c * n dimensional instance. Increasing the dimension of training data from n to c * n is key to increase the number of training instances from N to C(N, c). Transformed training data are called virtual big data since they differ original training instances in terms of size and dimension. Our experiments show that, V-GAN, a GAN trained by virtual big data can outperform standard GANs when it comes to deal with extremely scarce training data. Furthermore, V-GAN can outperform traditional oversampling techniques in terms of precision, F1 score and Area Under Curve (AUC) score.
Year
DOI
Venue
2019
10.1109/BigData47090.2019.9006268
2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)
Keywords
Field
DocType
GAN, Data Augmentation, Imbalanced Data Classification
Training set,Data mining,Discrete mathematics,F1 score,Discriminator,Oversampling,Computer science,Curse of dimensionality,Concatenation,Big data
Conference
ISSN
Citations 
PageRank 
2639-1589
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Hadi Mansourifar100.34
Lin Chen210023.63
Weidong Shi31456.45