Title
A Gaussian mixture model based combined resampling algorithm for classification of imbalanced credit data sets
Abstract
Credit scoring represents a two-classification problem. Moreover, the data imbalance of the credit data sets, where one class contains a small number of data samples and the other contains a large number of data samples, is an often problem. Therefore, if only a traditional classifier is used to classify the data, the final classification effect will be affected. To improve the classification of the credit data sets, a Gaussian mixture model based combined resampling algorithm is proposed. This resampling approach first determines the number of samples of the majority class and the minority class using a sampling factor. Then, the Gaussian mixture clustering is used for undersampling of the majority of samples, and the synthetic minority oversampling technique is used for the rest of the samples, so an eventual imbalance problem is eliminated. Here we compare several resampling methods commonly used in the analysis of imbalanced credit data sets. The obtained experimental results demonstrate that the proposed method consistently improves classification performances such as F-measure, AUC, G-mean, and so on. In addition, the method has strong robustness for credit data sets.
Year
DOI
Venue
2019
10.1007/s13042-019-00953-2
International Journal of Machine Learning and Cybernetics
Keywords
Field
DocType
Credit scoring, Imbalanced data, Combined resampling, Gaussian mixture model
Data set,Oversampling,Computer science,Undersampling,Algorithm,Robustness (computer science),Gaussian,Cluster analysis,Resampling,Mixture model
Journal
Volume
Issue
ISSN
10
12
1868-8071
Citations 
PageRank 
References 
2
0.36
0
Authors
6
Name
Order
Citations
PageRank
Xu Han120.36
Runbang Cui220.36
Yanfei Lan321815.92
Yanzhe Kang420.69
Jiang Deng520.36
Ning Jia682.87