Abstract | ||
---|---|---|
Clustering has been widely used in different fields of science, technology, social science, etc. Naturally, clusters are in arbitrary (non-convex) shapes in a dataset. One important class of clustering is distance based method. However, distance based clustering methods usually find clusters of convex shapes. Classical single-link is a distance based clustering method, which can find arbitrary shaped clusters. It scans dataset multiple times and has time requirement of O(n^2), where n is the size of the dataset. This is potentially a severe problem for a large dataset. In this paper, we propose a distance based clustering method, l-SL to find arbitrary shaped clusters in a large dataset. In this method, first leaders clustering method is applied to a dataset to derive a set of leaders; subsequently single-link method (with distance stopping criteria) is applied to the leaders set to obtain final clustering. The l-SL method produces a flat clustering. It is considerably faster than the single-link method applied to dataset directly. Clustering result of the l-SL may deviate nominally from final clustering of the single-link method (distance stopping criteria) applied to dataset directly. To compensate deviation of the l-SL, an improvement method is also proposed. Experiments are conducted with standard real world and synthetic datasets. Experimental results show the effectiveness of the proposed clustering methods for large datasets. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1016/j.patcog.2011.04.027 | Pattern Recognition |
Keywords | Field | DocType |
final clustering,improvement method,large datasets,l-sl method,large dataset,hybrid clustering method,distance based clustering,flat clustering,arbitrary shaped clusters,arbitrary shaped cluster,dataset multiple time,single-link method,clustering method,proposed clustering method,leaders,single-link,social science | Hierarchical clustering,k-medians clustering,CURE data clustering algorithm,Complete-linkage clustering,Pattern recognition,Correlation clustering,Determining the number of clusters in a data set,Artificial intelligence,Cluster analysis,Mathematics,Machine learning,Single-linkage clustering | Journal |
Volume | Issue | ISSN |
44 | 12 | Pattern Recognition |
Citations | PageRank | References |
15 | 0.68 | 24 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Bidyut Kr. Patra | 1 | 90 | 10.17 |
Sukumar Nandi | 2 | 530 | 89.50 |
P. Viswanath | 3 | 148 | 11.77 |