Title
Learning Imbalanced Datasets Based on SMOTE and Gaussian Distribution
Abstract
The learning of imbalanced datasets is a ubiquitous challenge for researchers in the fields of data mining and machine learning. Conventional classifiers are often biased towards the majority class, and loss functions attempt to optimize the quantities. In this paper, we present two effective sampling methods that improve the data distributions. One rebalanced method, the Adaptive-SMOTE, improves the SMOTE method by adaptively selecting groups of Inner and Danger data from the minority class such that a new minority class is compiled based on the selected data, thus preventing an expansion of the category boundary and strengthening the distributional characteristics of the original data. The other method, Gaussian Oversampling, combines dimension reduction with the Gaussian distribution, which makes the tail of the Gaussian distribution thinner. Cross-validation experiments on 15 datasets show that the two sampling methods achieve significant improvements compared with other typical methods. The Adaptive-SMOTE has higher F-measure and Acc values than other existing sampling methods and higher robustness to classifiers and datasets with different values of Imb. Gaussian Oversampling is more efficient when dealing with extremely imbalanced classifications.
Year
DOI
Venue
2020
10.1016/j.ins.2019.10.048
Information Sciences
Keywords
Field
DocType
Imbalanced,Oversample,Gaussian distribution,SMOTE
Dimensionality reduction,Oversampling,Pattern recognition,Robustness (computer science),Gaussian,Artificial intelligence,Sampling (statistics),Machine learning,Mathematics
Journal
Volume
ISSN
Citations 
512
0020-0255
2
PageRank 
References 
Authors
0.36
0
4
Name
Order
Citations
PageRank
Tingting Pan131.06
Junhong Zhao2277.02
Wei Wu330528.13
Jie Yang431.06