Title
A statistics-based approach to control the quality of subclusters in incremental gravitational clustering
Abstract
As the sizes of many contemporary databases continue to grow rapidly, incremental clustering has emerged as an essential issue for conducting data analysis on contemporary databases. An incremental clustering algorithm refers to an abstraction of the distribution of the data instances generated by the previous run of the algorithm and therefore is able to cope well with the ever-growing contemporary databases. There are two main challenges in the design of incremental clustering algorithms. The first challenge is how to reduce information loss due to the data abstraction (or summarization) operations. The second challenge is that the clustering result should not be sensitive to the order of input data. This paper presents the GRIN algorithm, an incremental hierarchical clustering algorithm for numerical datasets based on the gravity theory in physics. In the design of GRIN, a statistical test aimed at reducing information loss and distortion is employed to control formation of subclusters as well as to monitor the evolution of the dataset. Due to the statistical test-based summarization approach, GRIN is able to achieve near linear scalability and is not sensitive to input ordering.
Year
DOI
Venue
2005
10.1016/j.patcog.2005.03.005
Pattern Recognition
Keywords
Field
DocType
incremental gravitational clustering,data analysis,grin algorithm,data abstraction,incremental hierarchical clustering algorithm,information loss,statistics-based approach,incremental clustering,contemporary databases,incremental clustering algorithm,clustering result,data instance,statistical test,hierarchical clustering,data clustering
Data mining,Fuzzy clustering,CURE data clustering algorithm,Computer science,Artificial intelligence,Cluster analysis,Canopy clustering algorithm,Data stream clustering,Pattern recognition,Correlation clustering,Determining the number of clusters in a data set,Constrained clustering,Statistics,Machine learning
Journal
Volume
Issue
ISSN
38
12
Pattern Recognition
Citations 
PageRank 
References 
13
0.63
12
Authors
3
Name
Order
Citations
PageRank
Chien-Yu Chen136729.24
Shien-ching Hwang214110.55
Yen-Jen Oyang342348.82