Title
A distance based clustering method for arbitrary shaped clusters in large datasets
Abstract
Clustering has been widely used in different fields of science, technology, social science, etc. Naturally, clusters are in arbitrary (non-convex) shapes in a dataset. One important class of clustering is distance based method. However, distance based clustering methods usually find clusters of convex shapes. Classical single-link is a distance based clustering method, which can find arbitrary shaped clusters. It scans dataset multiple times and has time requirement of O(n^2), where n is the size of the dataset. This is potentially a severe problem for a large dataset. In this paper, we propose a distance based clustering method, l-SL to find arbitrary shaped clusters in a large dataset. In this method, first leaders clustering method is applied to a dataset to derive a set of leaders; subsequently single-link method (with distance stopping criteria) is applied to the leaders set to obtain final clustering. The l-SL method produces a flat clustering. It is considerably faster than the single-link method applied to dataset directly. Clustering result of the l-SL may deviate nominally from final clustering of the single-link method (distance stopping criteria) applied to dataset directly. To compensate deviation of the l-SL, an improvement method is also proposed. Experiments are conducted with standard real world and synthetic datasets. Experimental results show the effectiveness of the proposed clustering methods for large datasets.
Year
DOI
Venue
2011
10.1016/j.patcog.2011.04.027
Pattern Recognition
Keywords
Field
DocType
final clustering,improvement method,large datasets,l-sl method,large dataset,hybrid clustering method,distance based clustering,flat clustering,arbitrary shaped clusters,arbitrary shaped cluster,dataset multiple time,single-link method,clustering method,proposed clustering method,leaders,single-link,social science
Hierarchical clustering,k-medians clustering,CURE data clustering algorithm,Complete-linkage clustering,Pattern recognition,Correlation clustering,Determining the number of clusters in a data set,Artificial intelligence,Cluster analysis,Mathematics,Machine learning,Single-linkage clustering
Journal
Volume
Issue
ISSN
44
12
Pattern Recognition
Citations 
PageRank 
References 
15
0.68
24
Authors
3
Name
Order
Citations
PageRank
Bidyut Kr. Patra19010.17
Sukumar Nandi253089.50
P. Viswanath314811.77