Title
A Kolmogorov-Smirnov statistic based segmentation approach to learning from imbalanced datasets: With application in property refinance prediction
Abstract
Classification is an important task in data mining. Class imbalance has been reported to hinder the performance of standard classification models. However, our study shows that class imbalance may not be the only cause to blame for poor performance. Rather, the underlying complexity of the problem may play a more fundamental role. In this paper, a decision tree method based on Kolmogorov-Smirnov statistic (K-S tree), is proposed to segment the training data so that a complex problem can be divided into several easier sub-problems where class imbalance becomes less challenging. K-S tree is also used to perform feature selection, which not only selects relevant variables but also removes redundant ones. After segmentation, a two-way re-sampling method is used at the segment level to empirically determine the optimal sampling percentage and the rebalanced data is used to fit logistic regression models, also at the segment level. The effectiveness of the proposed method is demonstrated through its application on property refinance prediction.
Year
DOI
Venue
2012
10.1016/j.eswa.2011.12.011
Expert Syst. Appl.
Keywords
Field
DocType
data mining,kolmogorov-smirnov statistic,segment level,complex problem,segmentation approach,two-way re-sampling method,imbalanced datasets,training data,rebalanced data,class imbalance,k-s tree,decision tree method,segmentation,decision tree
Training set,Data mining,Decision tree,Statistic,Feature selection,Segmentation,Computer science,Kolmogorov–Smirnov test,Sampling (statistics),Artificial intelligence,Logistic regression,Machine learning
Journal
Volume
Issue
ISSN
39
6
0957-4174
Citations 
PageRank 
References 
4
0.42
22
Authors
2
Name
Order
Citations
PageRank
Rongsheng Gong1101.30
Samuel H. Huang219319.64