Abstract | ||
---|---|---|
Imbalanced data classification remains a focus of intense research, mostly due to the prevalence of data imbalance in various real-life application domains. A disproportion among objects from different classes may significantly affect the performance of standard classification models. The first problem is the high imbalance ratios that pose a serious learning difficulty and require usage of dedicated methods, capable of alleviating this issue. The second important problem which may appear is noise, which may be accompanying the training data and causing strong deterioration of the classifier performance or increase the time required for its training. Therefore, the desirable classification model should be robust to both skewed data distributions and noise. One of the most popular approaches for handling imbalanced data is oversampling of the minority objects in their neighborhood. In this work we will criticize this approach and propose a novel strategy for dealing with imbalanced data, with particular focus on the noise presence. We propose Radial-Based Oversampling (RBO) method, which can find regions in which the synthetic objects from minority class should be generated on the basis of the imbalance distribution estimation with radial basis functions. Results of experiments, carried out on a representative set of benchmark datasets, confirm that the proposed guided synthetic oversampling algorithm offers an interesting alternative to popular state-of-the-art solutions for imbalanced data preprocessing. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1016/j.neucom.2018.04.089 | Neurocomputing |
Keywords | Field | DocType |
Pattern classification,Machine learning,Imbalanced data,Oversampling,Radial basis functions,Noisy data | Training set,Radial basis function,Pattern recognition,Oversampling,Data pre-processing,Artificial intelligence,Data imbalance,Data classification,Classifier (linguistics),Mathematics,Machine learning | Journal |
Volume | ISSN | Citations |
343 | 0925-2312 | 4 |
PageRank | References | Authors |
0.38 | 41 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Michal Koziarski | 1 | 33 | 4.18 |
Bartosz Krawczyk | 2 | 721 | 60.97 |
Michał Woźniak | 3 | 213 | 24.64 |