Cluster-Based sampling approaches to imbalanced data distributions - Citegraph

Paper Info

Title
Cluster-Based sampling approaches to imbalanced data distributions

Abstract
For classification problem, the training data will significantly influence the classification accuracy. When the data set is highly unbalanced, classification algorithms tend to degenerate by assigning all cases to the most common outcome. Hence, it is important to select the suitable training data for classification in the imbalanced class distribution problem. In this paper, we propose cluster-based under-sampling approaches for selecting the representative data as training data to improve the classification accuracy in the imbalanced class distribution environment. The basic classification algorithm of neural network model is considered. The experimental results show that our cluster-based under-sampling approaches outperform the other under-sampling techniques in the previous studies.

Year	DOI	Venue
2006	10.1007/11823728_41	DaWaK
Keywords	Field	DocType
suitable training data,training data,representative data,classification accuracy,cluster-based under-sampling approach,classification algorithm,basic classification algorithm,classification problem,cluster-based sampling approach,under-sampling technique,imbalanced data distribution,sampling technique,distributed environment,neural network model	Data warehouse,Training set,Data mining,One-class classification,Computer science,Decision support system,Artificial intelligence,Knowledge extraction,Sampling (statistics),Statistical classification,Artificial neural network,Machine learning	Conference
Volume	ISSN	ISBN
4081	0302-9743	3-540-37736-0
Citations	PageRank	References
7	0.53	4
Authors
2

Authors (2 rows)

Cited by (7 rows)

References (4 rows)

Name	Order	Citations	PageRank
Show-Jane Yen	1	537	130.05
Yue-Shi Lee	2	543	41.14

1