Title
Efficient distributed clustering using boundary information
Abstract
In the era of big data, it is increasingly common that large amount of data is generated across multiple distributed sites and cannot be gathered into a centralized site for further analysis, which invalidates the assumption of traditional clustering techniques based on centralized models. The major challenge is that these distributed datasets cannot be trivially merged due to issues such as privacy concerns, limited network bandwidth among sites and limited computational capacity of a single site. To tackle this challenge, we propose an efficient distributed clustering scheme using boundary information (DCUBI), which features good flexibility and scalability. The main procedure of DCUBI consists of three steps: local-global-local. Firstly, each local site extracts the boundary points from its own local data and applies traditional clustering on boundary points only. Secondly, labeled boundary points from each site are sent to the central site as local representatives where boundary and cluster fusion is conducted to form the global clustering model. Finally, the global boundary and cluster information is sent back to each local site for refined local clustering. To demonstrate the effectiveness of DCUBI, we plug the well-known DBSCAN algorithm into DCUBI and comprehensive experiments are conducted using datasets with different properties. Experiment results clearly verify the quality of clustering by DCUBI as well as its superior time efficiency when the volume of data in each site is large. Furthermore, other popular clustering techniques especially those with high time complexity such as spectral clustering and affinity propagation clustering are also plugged into DCUBI to demonstrate the flexibility of the proposed scheme. (C) 2017 Elsevier B.V. All rights reserved.
Year
DOI
Venue
2018
10.1016/j.neucom.2017.11.014
NEUROCOMPUTING
Keywords
Field
DocType
Distributed clustering,DBSCAN,Cluster boundary,Density gradient
Data mining,Fuzzy clustering,CURE data clustering algorithm,Computer science,Artificial intelligence,Cluster analysis,Single-linkage clustering,Canopy clustering algorithm,Data stream clustering,Pattern recognition,Correlation clustering,Constrained clustering,Machine learning
Journal
Volume
ISSN
Citations 
275
0925-2312
2
PageRank 
References 
Authors
0.36
15
3
Name
Order
Citations
PageRank
Tong Qiuhui171.49
Li X224034.58
Yuan Bo353247.01