Scalable Clustering For Large High-Dimensional Data Based On Data Summarization - Citegraph

Paper Info

Title
Scalable Clustering For Large High-Dimensional Data Based On Data Summarization

Abstract
Clustering large data sets with high dimensionality is a challenging data-mining task. This paper presents a framework to perform such a task efficiently. It is based on the notion of data space reduction, which finds high density areas, or dense cells, in the given feature space. The dense cells store summarized information of the data. A designated partitioning or hierarchical clustering algorithm can be used as the second step to find clusters based on the data summaries. Using Kmeans as an example, this paper presents GARDEN-Kmeans, which performs data space reduction using Gamma Region DENsity partition, and utilizes Kmeans to cluster the summarized information. The experimental study shows that GARDEN-Kmeans executes several orders of magnitude faster than basic Kmeans and the recursive bisection Kmeans algorithm of CLUTO, while producing comparable clustering quality.

Year	DOI	Venue
2007	10.1109/CIDM.2007.368910	2007 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING, VOLS 1 AND 2
Keywords	Field	DocType
hierarchical clustering,feature space,high dimensional data,data mining,data handling	Fuzzy clustering,Data mining,CURE data clustering algorithm,Computer science,Artificial intelligence,Cluster analysis,Single-linkage clustering,Canopy clustering algorithm,Data stream clustering,Pattern recognition,Correlation clustering,Determining the number of clusters in a data set,Machine learning	Conference
Citations	PageRank	References
1	0.41	20
Authors
4

Authors (4 rows)

Cited by (1 rows)

References (20 rows)

Name	Order	Citations	PageRank
Ying Lai	1	12	1.10
Ratko Orlandic	2	116	23.50
Wai Gen Yee	3	301	27.33
Sachin Kulkarni	4	151	8.87

1