Title | ||
---|---|---|
A Hybrid Surrogate Model for Evolutionary Undersampling in Imbalanced Classification. |
Abstract | ||
---|---|---|
Data preprocessing is a key stage in data mining that allows machine learning algorithms to obtain meaningful insights. Many preprocessing problems such as feature selection or instance selection can be modelled as optimisation/search problems. Evolutionary algorithms have traditionally excelled in this task when dealing with data of a moderate size. However, their application to large datasets typically involves very high computational costs. In this work, we propose a hybrid surrogate model for evolutionary undersampling in imbalanced classification problems. These are characterised by having a highly skewed distribution of classes in which evolutionary algorithms aim to balance the training data by selecting only the most relevant data. The proposed technique combines a two-stage clustering-based surrogate method with a windowing approach to quickly approximate fitness values of the chromosomes and accelerate the search. The experiments carried out in 44 standard imbalanced datasets show that the proposed hybrid surrogate model highly reduces the computational cost of the evolutionary algorithm without a considerable loss of performance. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/CEC48606.2020.9185774 | CEC |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hoang Lam Le | 1 | 0 | 0.34 |
Dario Landa Silva | 2 | 316 | 28.38 |
Mikel Galar | 3 | 1003 | 40.90 |
Salvador García | 4 | 4151 | 118.45 |
Isaac Triguero | 5 | 633 | 31.76 |