Title
SPARCL: Efficient and Effective Shape-Based Clustering
Abstract
Clustering is one of the fundamental data mining tasks. Many different clustering paradigms have been developed over the years, which include partitional, hierarchical, mixture model based, density-based, spectral, subspace, and so on. The focus of this paper is on full-dimensional, arbitrary shaped clusters. Existing methods for this problem suffer either in terms of the memory or time complexity (quadratic or even cubic). This shortcoming has restricted these algorithms to datasets of moderate sizes. In this paper we propose SPARCL, a simple and scalable algorithm for finding clusters with arbitrary shapes and sizes, and it has linear space and time complexity. SPARCL consists of two stages - the first stage runs a carefully initialized version of the K-means algorithm to generate many small seed clusters. The second stage iteratively merges the generated clusters to obtain the final shape-based clusters. Experiments were conducted on a variety of datasets to highlight the effectiveness, efficiency, and scalability of our approach. On the large datasets SPARCL is an order of magnitude faster than the best existing approaches.
Year
DOI
Venue
2008
10.1109/ICDM.2008.73
Pisa
Keywords
Field
DocType
computational complexity,data mining,iterative methods,pattern clustering,K-means algorithm,SPARCL algorithm,data mining,full-dimensional arbitrary shaped cluster,iterative method,linear space complexity,memory complexity,shape-based clustering,time complexity
Data mining,Subspace topology,Iterative method,Computer science,Linear space,Artificial intelligence,Time complexity,Cluster analysis,Mixture model,Machine learning,Computational complexity theory,Scalability
Conference
ISSN
ISBN
Citations 
1550-4786
978-0-7695-3502-9
12
PageRank 
References 
Authors
0.92
9
4
Name
Order
Citations
PageRank
Vineet Chaoji142819.50
Mohammad Al Hasan242735.08
Saeed Salem318217.39
Mohammed Javeed Zaki47972536.24