Title
Approximate kernel k-means: solution to large scale kernel clustering
Abstract
Digital data explosion mandates the development of scalable tools to organize the data in a meaningful and easily accessible form. Clustering is a commonly used tool for data organization. However, many clustering algorithms designed to handle large data sets assume linear separability of data and hence do not perform well on real world data sets. While kernel-based clustering algorithms can capture the non-linear structure in data, they do not scale well in terms of speed and memory requirements when the number of objects to be clustered exceeds tens of thousands. We propose an approximation scheme for kernel k-means, termed approximate kernel k-means, that reduces both the computational complexity and the memory requirements by employing a randomized approach. We show both analytically and empirically that the performance of approximate kernel k-means is similar to that of the kernel k-means algorithm, but with dramatically reduced run-time complexity and memory requirements.
Year
DOI
Venue
2011
10.1145/2020408.2020558
KDD
Keywords
Field
DocType
large scale kernel clustering,digital data explosion,real world data set,large data set,data organization,kernel k-means,memory requirement,kernel k-means algorithm,approximate kernel k-means,kernel-based clustering algorithm,computational complexity,time complexity,clustering,k means algorithm,k means
Data mining,Radial basis function kernel,Computer science,Tree kernel,Theoretical computer science,Polynomial kernel,Artificial intelligence,Cluster analysis,String kernel,Kernel embedding of distributions,Kernel method,Variable kernel density estimation,Machine learning
Conference
Citations 
PageRank 
References 
69
2.21
36
Authors
4
Name
Order
Citations
PageRank
Radha Chitta11778.01
Rong Jin26206334.26
Timothy C. Havens333215.29
Anil Jain4335073334.84