Discovering Minority Sub-Clusters And Local Difficulty Factors From Imbalance Data - Citegraph

Paper Info

Title
Discovering Minority Sub-Clusters And Local Difficulty Factors From Imbalance Data

Abstract
Learning classifiers from imbalanced data is particularly challenging when class imbalance is accompanied by local data difficulty factors, such as outliers, rare cases, class overlapping, or minority class decomposition. Although these issues have been highlighted in previous research, there have been no proposals of algorithms that simultaneously detect all the aforementioned difficulties in a dataset. In this paper, we put forward two extensions to popular clustering algorithms, ImKmeans and ImScan, and one novel algorithm, ImGrid, that attempt to detect minority sub-clusters, outliers, rare cases, and class overlapping. Experiments with artificial datasets show that ImGrid, which uses a Bayesian test to join similar neighboring regions, is able to re-discover simulated clusters and types of minority examples on par with competing methods, while being the least sensitive to parameter tuning.

Year	DOI	Venue
2017	10.1007/978-3-319-67786-6_23	DISCOVERY SCIENCE, DS 2017
Keywords	Field	DocType
Class imbalance, Minority class categorization, Data difficulty factors, Class overlapping, Minority sub-clusters	Data mining,Cluster (physics),Computer science,Outlier,Artificial intelligence,Cluster analysis,Machine learning,Bayesian probability	Conference
Volume	ISSN	Citations
10558	0302-9743	2
PageRank	References	Authors
0.37	12	4

Authors (4 rows)

Cited by (2 rows)

References (12 rows)

Name	Order	Citations	PageRank
Mateusz Lango	1	7	3.12
Dariusz Brzezinski	2	213	11.28
Sebastian Firlik	3	2	0.37
Jerzy Stefanowski	4	1653	139.25

1