Abstract | ||
---|---|---|
Large-scale clustering has found wide applications in many fields and received much attention in recent years. However, most existing large-scale clustering methods can only achieve mediocre performance, because they are sensitive to the unavoidable presence of noise in the large-scale data. To address this challenging problem, we thus propose a large-scale sparse clustering (LSSC) algorithm. In this paper, we choose a two-step optimization strategy for large-scale sparse clustering: 1) k-means clustering over the large-scale data to obtain the initial clustering results; 2) clustering refinement over the initial results by developing a spare coding algorithm. To guarantee the scalability of the second step for large-scale data, we also utilize nonlinear approximation and dimension reduction techniques to speed up the sparse coding algorithm. Experimental results on both synthetic and real-world datasets demonstrate the promising performance of our LSSC algorithm. |
Year | Venue | Field |
---|---|---|
2016 | IJCAI | Data mining,Canopy clustering algorithm,CURE data clustering algorithm,Clustering high-dimensional data,Data stream clustering,Correlation clustering,Computer science,Determining the number of clusters in a data set,Constrained clustering,Artificial intelligence,Cluster analysis,Machine learning |
DocType | Citations | PageRank |
Conference | 1 | 0.36 |
References | Authors | |
18 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ruqi Zhang | 1 | 1 | 0.70 |
Zhiwu Lu | 2 | 479 | 47.42 |