Title
Class imbalance and the curse of minority hubs
Abstract
Most machine learning tasks involve learning from high-dimensional data, which is often quite difficult to handle. Hubness is an aspect of the curse of dimensionality that was shown to be highly detrimental to k-nearest neighbor methods in high-dimensional feature spaces. Hubs, very frequent nearest neighbors, emerge as centers of influence within the data and often act as semantic singularities. This paper deals with evaluating the impact of hubness on learning under class imbalance with k-nearest neighbor methods. Our results suggest that, contrary to the common belief, minority class hubs might be responsible for most misclassification in many high-dimensional datasets. The standard approaches to learning under class imbalance usually clearly favor the instances of the minority class and are not well suited for handling such highly detrimental minority points. In our experiments, we have evaluated several state-of-the-art hubness-aware kNN classifiers that are based on learning from the neighbor occurrence models calculated from the training data. The experiments included learning under severe class imbalance, class overlap and mislabeling and the results suggest that the hubness-aware methods usually achieve promising results on the examined high-dimensional datasets. The improvements seem to be most pronounced when handling the difficult point types: borderline points, rare points and outliers. On most examined datasets, the hubness-aware approaches improve the classification precision of the minority classes and the recall of the majority class, which helps with reducing the negative impact of minority hubs. We argue that it might prove beneficial to combine the extensible hubness-aware voting frameworks with the existing class imbalanced kNN classifiers, in order to properly handle class imbalanced data in high-dimensional feature spaces.
Year
DOI
Venue
2013
10.1016/j.knosys.2013.08.031
Knowl.-Based Syst.
Keywords
Field
DocType
minority class,minority class hub,high-dimensional datasets,detrimental minority point,existing class,severe class imbalance,majority class,class imbalanced data,class imbalance,high-dimensional feature space,minority hub,curse of dimensionality,k nearest neighbor,classification
Training set,k-nearest neighbors algorithm,Data mining,Voting,Computer science,Curse,Outlier,Curse of dimensionality,Artificial intelligence,Recall,Machine learning
Journal
Volume
ISSN
Citations 
53,
0950-7051
12
PageRank 
References 
Authors
0.53
71
2
Name
Order
Citations
PageRank
Nenad Tomasev1987.60
Dunja Mladenic21484170.14