Title
A highly scalable clustering scheme using boundary information
Abstract
Many advanced clustering techniques are effective in dealing datasets in complicated situations. However, when facing large datasets, which are increasingly common in the era of big data, the time requirements of most existing techniques can quickly become intolerable. To tackle this challenge, in this paper, we propose Scalable Clustering Using Boundary Information (SCUBI), a highly flexible and scalable clustering scheme. The idea of SCUBI is to identify the boundary points of the original dataset in the first place and then group boundary points into suitable clusters using existing clustering techniques. Finally, the rest points are assigned to the same cluster as their nearest boundary points. To demonstrate the effectiveness and scalability of SCUBI, we plug the well-known DBSCAN algorithm into SCUBI. Comprehensive experiments are conducted using datasets with up to two million data points to compare the clustering results and time efficiency between DBSCAN and SCUBI-DBSCAN. Experimental results show that our method can obtain almost identical clustering results as the standard DBSCAN while achieving orders of magnitude speedup especially on large datasets, which confirms the scalability of SCUBI. Experiments are also performed on other clustering algorithms with high time complexity to verify the flexibility of SCUBI. © 2017 Elsevier B.V.
Year
DOI
Venue
2017
10.1016/j.patrec.2017.01.016
Pattern Recognition Letters
Keywords
Field
DocType
Cluster boundary,Clustering,DBSCAN,Density gradient
OPTICS algorithm,Data mining,Canopy clustering algorithm,CURE data clustering algorithm,Data stream clustering,Correlation clustering,SUBCLU,Cluster analysis,Mathematics,DBSCAN
Journal
Volume
ISSN
Citations 
89
01678655
5
PageRank 
References 
Authors
0.45
8
3
Name
Order
Citations
PageRank
Tong Qiuhui171.49
Li X224034.58
Yuan Bo353247.01