Title
Synchronization clustering based on central force optimization and its extension for large-scale datasets.
Abstract
Although research on clustering methods has been active in recent years, not only must most current clustering methods pre-set the number of clusters or other user-specific parameters but they also perform on large-scale datasets inefficiently. In this paper, we study the clustering problem by exploring the metaphor of gravitational kinematics based on Central Force Optimization (CFO). However, different from the global synchronization of CFO, we propose a new algorithm G-Sync by simulating the partial synchronization phenomenon. Specifically, we view each data object as a probe and simulate the dynamic interaction behaviour of data objects in the gravitational field. As time evolves, similar data objects will naturally come into partial synchronization and form distinct clusters measured by the proposed degree of local synchronization, and the dynamic interaction behaviour of the data objects is continually simulated over time. By introducing the DaviesBouldin (DB) index, G-Sync can determine clusters of arbitrary size, shape and density. Moreover, pre-setting the number of clusters to be found is not required. The algorithm is further extended for handling large-scale datasets with the scalable S-G-Sync algorithm, which is based on fast kernel density estimation (FastKDE). S-G-Sync initially condenses a large-scale dataset quickly into its reduced dataset, followed by adaptive clustering on the reduced dataset using G-Sync. Finally, the Clustering on Remaining Objects (CRO) algorithm is proposed to cluster the remaining objects in the large-scale dataset and to capture outlier and singleton clusters effectively. The effectiveness of the G-Sync and S-G-Sync algorithms is theoretically analysed and experimentally verified on synthetic and real-world datasets.
Year
DOI
Venue
2017
10.1016/j.knosys.2016.11.007
Knowl.-Based Syst.
Keywords
Field
DocType
Gravitational kinematics,Central force optimization,Partial synchronization,Synchronization clustering,Fast kernel density estimation,Large-scale datasets
Data mining,Canopy clustering algorithm,Fuzzy clustering,CURE data clustering algorithm,Data stream clustering,Correlation clustering,Computer science,Consensus clustering,Artificial intelligence,FLAME clustering,Cluster analysis,Machine learning
Journal
Volume
Issue
ISSN
118
C
0950-7051
Citations 
PageRank 
References 
5
0.45
21
Authors
3
Name
Order
Citations
PageRank
Wenlong Hang150.45
Kup-Sze Choi252647.41
Shitong Wang31485109.13