Title
Spectral Clustering with Perturbed Data
Abstract
Spectral clustering is useful for a wide-ranging set of appl ications in areas such as biological data analysis, image processing and data mining. However, the com- putational and/or communication resources required by the method in processing large-scale data are often prohibitively high, and practit ioners are often required to perturb the original data in various ways (quantization, downsampling, etc) before invoking a spectral algorithm. In this paper, we use stochastic perturbation theory to study the effects of data perturbation on the performance of spectral clustering. We show that the error under perturbation of spectral clustering is closely related to the perturbation of the eigenvectors of the Laplacian matrix. From this result we derive approximate upper bounds on the clustering error. We show that this bound is tight empirically across a wide range of problems, suggesting that it can be used in practical settings to determine the amount of data reduction allowed in order to meet a specification of permitted loss in clustering performance.
Year
Venue
Keywords
2008
NIPS
laplacian matrix,data mining,data reduction,perturbation theory,spectral clustering,upper bound,eigenvectors,biological data
Field
DocType
Citations 
Spectral clustering,Canopy clustering algorithm,Mathematical optimization,CURE data clustering algorithm,Clustering high-dimensional data,Data stream clustering,Correlation clustering,Computer science,Constrained clustering,Cluster analysis
Conference
42
PageRank 
References 
Authors
1.89
10
4
Name
Order
Citations
PageRank
Ling Huang12496118.80
Donghui Yan21809.18
Michael I. Jordan3312203640.80
Nina Taft42109154.92