Abstract | ||
---|---|---|
One of the main challenges in the design of modern clustering algorithms is that, in many applications, new data sets are continuously added into an already huge database. As a result, it is impractical to carry out data clustering from scratch whenever there are new data instances added into the database. One way to tackle this challenge is to incorporate a clustering algorithm that operates incrementally. Another desirable feature of clustering algorithms is that a clustering dendrogram is generated. This feature is crucial for many applications in biological, social, and behavior studies, due to the need to construct taxonomies. This paper presents the GRIN algorithm, an incremental hierarchical clustering algorithm for numerical data sets based on gravity theory in physics. The GRIN algorithm delivers favorite clustering quality and generally features O(n) time complexity. One main factor that makes the GRIN algorithm be able to deliver favorite clustering quality is that the optimal parameters settings in the GRIN algorithm are not sensitive to the distribution of the data set. On the other hand, many modern clustering algorithms suffer unreliable or poor clustering quality when the data set contains highly skewed local distributions so that no optimal values can be found for some global parameters. This paper also reports the experiments conducted to study the characteristics of the GRIN algorithm. |
Year | DOI | Venue |
---|---|---|
2002 | 10.1007/3-540-47887-6_23 | PAKDD |
Keywords | Field | DocType |
new data set,favorite clustering quality,incremental hierarchical data clustering,grin algorithm,modern clustering algorithm,incremental hierarchical clustering algorithm,gravity theory,new data instance,clustering dendrogram,poor clustering quality,clustering algorithm,time complexity,hierarchical clustering,data clustering,hierarchical data | Data mining,Fuzzy clustering,Canopy clustering algorithm,CURE data clustering algorithm,Data stream clustering,Correlation clustering,Computer science,Determining the number of clusters in a data set,Constrained clustering,Artificial intelligence,Cluster analysis,Machine learning | Conference |
ISBN | Citations | PageRank |
3-540-43704-5 | 14 | 0.74 |
References | Authors | |
9 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Chien-Yu Chen | 1 | 367 | 29.24 |
Shien-ching Hwang | 2 | 141 | 10.55 |
Yen-Jen Oyang | 3 | 423 | 48.82 |