An Incremental Hierarchical Data Clustering Algorithm Based on Gravity Theory - Citegraph

Paper Info

Title
An Incremental Hierarchical Data Clustering Algorithm Based on Gravity Theory

Abstract
One of the main challenges in the design of modern clustering algorithms is that, in many applications, new data sets are continuously added into an already huge database. As a result, it is impractical to carry out data clustering from scratch whenever there are new data instances added into the database. One way to tackle this challenge is to incorporate a clustering algorithm that operates incrementally. Another desirable feature of clustering algorithms is that a clustering dendrogram is generated. This feature is crucial for many applications in biological, social, and behavior studies, due to the need to construct taxonomies. This paper presents the GRIN algorithm, an incremental hierarchical clustering algorithm for numerical data sets based on gravity theory in physics. The GRIN algorithm delivers favorite clustering quality and generally features O(n) time complexity. One main factor that makes the GRIN algorithm be able to deliver favorite clustering quality is that the optimal parameters settings in the GRIN algorithm are not sensitive to the distribution of the data set. On the other hand, many modern clustering algorithms suffer unreliable or poor clustering quality when the data set contains highly skewed local distributions so that no optimal values can be found for some global parameters. This paper also reports the experiments conducted to study the characteristics of the GRIN algorithm.

Year	DOI	Venue
2002	10.1007/3-540-47887-6_23	PAKDD
Keywords	Field	DocType
new data set,favorite clustering quality,incremental hierarchical data clustering,grin algorithm,modern clustering algorithm,incremental hierarchical clustering algorithm,gravity theory,new data instance,clustering dendrogram,poor clustering quality,clustering algorithm,time complexity,hierarchical clustering,data clustering,hierarchical data	Data mining,Fuzzy clustering,Canopy clustering algorithm,CURE data clustering algorithm,Data stream clustering,Correlation clustering,Computer science,Determining the number of clusters in a data set,Constrained clustering,Artificial intelligence,Cluster analysis,Machine learning	Conference
ISBN	Citations	PageRank
3-540-43704-5	14	0.74
References	Authors
9	3

Authors (3 rows)

Cited by (14 rows)

References (9 rows)

Name	Order	Citations	PageRank
Chien-Yu Chen	1	367	29.24
Shien-ching Hwang	2	141	10.55
Yen-Jen Oyang	3	423	48.82

1