Abstract | ||
---|---|---|
Imbalanced data situation is that there are unequal distributions of data samples between different classes. It usually poses a challenge to any classification methods as it becomes hard to learn and predict the minority class samples since there are too small number of minority instances compare to majority instances. One of approaches for imbalanced class problems is to oversample by generating synthetic samples around given minority instances based on their nearest neighbors, so that the numbers of major and minor instances are balanced. However, if nearest neighbors are wrongly chosen, it may cause overfitting or underfitting problems. We propose a novel oversampling method for efficiently handling imbalanced data problems. Our proposed method generates synthetic samples and decides whether to reject or accept it by considering the location of the synthetic samples. With our proposed method, we have observed the outperformed results obtained within the framework of real world imbalanced datasets. In addition, our proposed method is not sensitive to how to choose nearest neighbors for generating synthetic samples as much as the existing approaches for imbalance problem. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1145/2701126.2701181 | IMCOM |
Keywords | Field | DocType |
rejection rule,feature evaluation and selection,data preprocessing,experimentation,pattern analysis,synthetic minority oversampling technique,classifier design and evaluation,imbalanced problem,data distribution,performance | Small number,Data mining,Oversampling,Computer science,Data pre-processing,Artificial intelligence,Overfitting,Machine learning | Conference |
Citations | PageRank | References |
0 | 0.34 | 8 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jaedong Lee | 1 | 13 | 1.63 |
Noo-ri Kim | 2 | 27 | 4.55 |
Jee-Hyong Lee | 3 | 316 | 49.65 |