Title | ||
---|---|---|
Application of Preprocessing Methods to Imbalanced Clinical Data: An Experimental Study. |
Abstract | ||
---|---|---|
In this paper we describe an experimental study where we analyzed data difficulty factors encountered in imbalanced clinical data sets and examined how selected data preprocessing methods were able to address these factors. We considered five data sets describing various pediatric acute conditions. In all these data sets the minority class was sparse and overlapped with the majority classes, thus difficult to learn. We studied five different preprocessing methods: random under-and over-sampling, SMOTE, neighborhood cleaning rule and SPIDER2 that were combined with the following classifiers: k-nearest neighbors, decision trees and rules, naive Bayes, neural networks and support vector machines. Application of preprocessing always improved classification performance, and the largest improvement was observed for random undersampling. Moreover, naive Bayes was the best performing classifier regardless of a used preprocessing method. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1007/978-3-319-39796-2_41 | INFORMATION TECHNOLOGIES IN MEDICINE, ITIB 2016, VOL 1 |
Keywords | DocType | Volume |
Clinical data,Class imbalance,Data difficulty factors,Preprocessing methods,Classification performance | Conference | 471 |
ISSN | Citations | PageRank |
2194-5357 | 4 | 0.47 |
References | Authors | |
11 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Szymon Wilk | 1 | 461 | 40.94 |
Jerzy Stefanowski | 2 | 1653 | 139.25 |
Szymon Wojciechowski | 3 | 4 | 0.47 |
Ken Farion | 4 | 106 | 12.61 |
Wojtek Michalowski | 5 | 266 | 41.48 |