Abstract | ||
---|---|---|
The leading partitional clustering technique, k-means, is one of the most computationally efficient clustering methods. However, it produces a local optimal solution that strongly depends on its initial seeds. Bad initial seeds can also cause the splitting or merging of natural clusters even if the clusters are well separated. In this paper, we propose, ROBIN, a novel method for initial seed selection in k-means types of algorithms. It imposes constraints on the chosen seeds that lead to better clustering when k-means converges. The constraints make the seed selection method insensitive to outliers in the data and also assist it to handle variable density or multi-scale clusters. Furthermore, they (constraints) make the method deterministic, so only one run suffices to obtain good initial seeds, as opposed to traditional random seed selection approaches that need many runs to obtain good seeds that lead to satisfactory clustering. We did a comprehensive evaluation of ROBIN against state-of-the-art seeding methods on a wide range of synthetic and real datasets. We show that ROBIN consistently outperforms existing approaches in terms of the clustering quality. |
Year | DOI | Venue |
---|---|---|
2009 | 10.1016/j.patrec.2009.04.013 | Pattern Recognition Letters |
Keywords | Field | DocType |
k-means,partitional clustering,seed selection,good initial seed,initial seed,initial seed selection,bad initial seed,robust initialization,chosen seed,better clustering,clustering quality,computationally efficient clustering method,k -means,robust partitional,density insensitive seeding,good seed,satisfactory clustering,k means | k-means clustering,CURE data clustering algorithm,Correlation clustering,Pattern recognition,Outlier,Algorithm,Constrained clustering,Artificial intelligence,Random seed,Cluster analysis,Deterministic system (philosophy),Mathematics | Journal |
Volume | Issue | ISSN |
30 | 11 | Pattern Recognition Letters |
Citations | PageRank | References |
22 | 0.95 | 16 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mohammad Al Hasan | 1 | 427 | 35.08 |
Vineet Chaoji | 2 | 428 | 19.50 |
Saeed Salem | 3 | 182 | 17.39 |
Mohammed Javeed Zaki | 4 | 7972 | 536.24 |