Title
An Incremental Hierarchical Data Clustering Algorithm Based on Gravity Theory
Abstract
One of the main challenges in the design of modern clustering algorithms is that, in many applications, new data sets are continuously added into an already huge database. As a result, it is impractical to carry out data clustering from scratch whenever there are new data instances added into the database. One way to tackle this challenge is to incorporate a clustering algorithm that operates incrementally. Another desirable feature of clustering algorithms is that a clustering dendrogram is generated. This feature is crucial for many applications in biological, social, and behavior studies, due to the need to construct taxonomies. This paper presents the GRIN algorithm, an incremental hierarchical clustering algorithm for numerical data sets based on gravity theory in physics. The GRIN algorithm delivers favorite clustering quality and generally features O(n) time complexity. One main factor that makes the GRIN algorithm be able to deliver favorite clustering quality is that the optimal parameters settings in the GRIN algorithm are not sensitive to the distribution of the data set. On the other hand, many modern clustering algorithms suffer unreliable or poor clustering quality when the data set contains highly skewed local distributions so that no optimal values can be found for some global parameters. This paper also reports the experiments conducted to study the characteristics of the GRIN algorithm.
Year
DOI
Venue
2002
10.1007/3-540-47887-6_23
PAKDD
Keywords
Field
DocType
new data set,favorite clustering quality,incremental hierarchical data clustering,grin algorithm,modern clustering algorithm,incremental hierarchical clustering algorithm,gravity theory,new data instance,clustering dendrogram,poor clustering quality,clustering algorithm,time complexity,hierarchical clustering,data clustering,hierarchical data
Data mining,Fuzzy clustering,Canopy clustering algorithm,CURE data clustering algorithm,Data stream clustering,Correlation clustering,Computer science,Determining the number of clusters in a data set,Constrained clustering,Artificial intelligence,Cluster analysis,Machine learning
Conference
ISBN
Citations 
PageRank 
3-540-43704-5
14
0.74
References 
Authors
9
3
Name
Order
Citations
PageRank
Chien-Yu Chen136729.24
Shien-ching Hwang214110.55
Yen-Jen Oyang342348.82