Abstract | ||
---|---|---|
Kernel clustering algorithms have the ability to capture the non-linear structure inherent in many real world data sets and thereby, achieve better clustering performance than Euclidean distance based clustering algorithms. However, their quadratic computational complexity renders them non-scalable to large data sets. In this paper, we employ random Fourier maps, originally proposed for large scale classification, to accelerate kernel clustering. The key idea behind the use of random Fourier maps for clustering is to project the data into a low-dimensional space where the inner product of the transformed data points approximates the kernel similarity between them. An efficient linear clustering algorithm can then be applied to the points in the transformed space. We also propose an improved scheme which uses the top singular vectors of the transformed data matrix to perform clustering, and yields a better approximation of kernel clustering under appropriate conditions. Our empirical studies demonstrate that the proposed schemes can be efficiently applied to large data sets containing millions of data points, while achieving accuracy similar to that achieved by state-of-the-art kernel clustering algorithms. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1109/ICDM.2012.61 | ICDM |
Keywords | Field | DocType |
random fourier features,large data,real world data set,large data set,efficient linear clustering algorithm,data point,state-of-the-art kernel,clustering performance,random fourier map,efficient kernel clustering,kernel similarity,kernel clustering,computational complexity,fourier transforms | Canopy clustering algorithm,Fuzzy clustering,CURE data clustering algorithm,Clustering high-dimensional data,Data stream clustering,Correlation clustering,Pattern recognition,Computer science,Artificial intelligence,Biclustering,Cluster analysis,Machine learning | Conference |
ISSN | Citations | PageRank |
1550-4786 | 7 | 0.50 |
References | Authors | |
0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Radha Chitta | 1 | 177 | 8.01 |
Rong Jin | 2 | 6206 | 334.26 |
Anil Jain | 3 | 33507 | 3334.84 |