Empirical comparison of fast clustering algorithms for large data sets - Citegraph

Paper Info

Title
Empirical comparison of fast clustering algorithms for large data sets

Abstract
Several fast algorithms for clustering very large data sets have been proposed in the literature. CLARA is a combination of a sampling procedure and the classical PAM algorithm, while CLARANS adopts a serial randomized search strategy to find the optimal set of medoids. GAC-R3 and GAC-RARw exploit genetic search heuristics for solving clustering problems. In this research, we conducted an empirical comparison of these four clustering algorithms over a wide range of data characteristics. According to the experimental results, CLARANS outperforms its counterparts both in clustering quality and execution time when the number of clusters increases, clusters are more closely related, more asymmetric clusters are present, or more random objects exist in the data set. With a specific number of clusters, CLARA can efficiently achieve satisfactory clustering quality when the data size is larger, whereas GAC-R3 and GAC-RARw can achieve satisfactory clustering quality and efficiency when the data size is small, the number of clusters is small, and clusters are more distinct or symmetric.

Year	DOI	Venue
2000	10.1109/HICSS.2000.926655	HICSS
Keywords	Field	DocType
clara,pattern clustering,large data sets,asymmetric clusters,search problems,fast clustering algorithms,genetic algorithms,genetic search heuristics,data mining,random objects,serial randomized search strategy,sampling procedure,classical pam algorithm,random search,genetics	Canopy clustering algorithm,Data mining,CURE data clustering algorithm,Data stream clustering,Correlation clustering,Computer science,Determining the number of clusters in a data set,Constrained clustering,Cluster analysis,Single-linkage clustering	Conference
ISBN	Citations	PageRank
0-7695-0493-0	10	0.75
References	Authors
4	3

Authors (3 rows)

Cited by (10 rows)

References (4 rows)

Name	Order	Citations	PageRank
Chih-ping Wei	1	743	74.20
Yen-hsien Lee	2	118	16.64
Che-ming Hsu	3	23	4.00

1