Title
An efficiency curve for evaluating imbalanced classifiers considering intrinsic data characteristics: Experimental analysis
Abstract
Balancing the accuracy rates of the majority and minority classes is challenging in imbalanced classification. Furthermore, data characteristics have a significant impact on the performance of imbalanced classifiers, which are generally neglected by existing evaluation methods. The objective of this study is to introduce a new criterion to comprehensively evaluate imbalanced classifiers. Specifically, we introduce an efficiency curve that is established using data envelopment analysis without explicit inputs (DEA-WEI), to determine the trade-off between the benefits of improved minority class accuracy and the cost of reduced majority class accuracy. In sequence, we analyze the impact of the imbalanced ratio and typical imbalanced data characteristics on the efficiency of the classifiers. Empirical analyses using 68 imbalanced data reveal that traditional classifiers such as C4.5 and the k-nearest neighbor are more effective on disjunct data, whereas ensemble and undersampling techniques are more effective for overlapping and noisy data. The efficiency of cost-sensitive classifiers decreases dramatically when the imbalanced ratio increases. Finally, we investigate the reasons for the different efficiencies of classifiers on imbalanced data and recommend steps to select appropriate classifiers for imbalanced data based on data characteristics.
Year
DOI
Venue
2022
10.1016/j.ins.2022.06.045
Information Sciences
Keywords
DocType
Volume
Classification,Imbalanced dataset,Data intrinsic characteristics,Assessment metrics,Efficiency
Journal
608
ISSN
Citations 
PageRank 
0020-0255
0
0.34
References 
Authors
22
4
Name
Order
Citations
PageRank
Xiangrui Chao110.68
Gang Kou22527191.95
Yi Peng3130378.20
Alberto Fernández400.34