Title
Distributed Kernel Principal Component Analysis.
Abstract
Kernel Principal Component Analysis (KPCA) is a key technique in machine learning for extracting the nonlinear structure of data and pre-processing it for downstream learning algorithms. We study the distributed setting in which there are multiple workers, each holding a set of points, who wish to compute the principal components of the union of their pointsets. Our main result is a communication efficient algorithm that takes as input arbitrary data points and computes a set of global principal components, that give relative-error approximation for polynomial kernels, or give relative-error approximation with an arbitrarily small additive error for a wide family of kernels including Gaussian kernels. While recent work shows how to do PCA in a distributed setting, the setting is significantly more challenging. Although the kernel trick is useful for efficient computation, it is unclear how to use it to reduce communication. The main problem with previous work is that it achieves communication proportional to the dimension of the data points, which would be proportional to the dimension of the feature space, or to the number of examples, both of which could be very large. We instead first select a small subset of points whose span contains a good approximation (the column subset selection problem, CSS), and then use sketching for low rank approximation to achieve our result. The column subset selection is done using a careful combination of oblivious subspace embeddings for kernels, oblivious leverage score approximation, and adaptive sampling. These lead to nearly optimal communication bound for CSS, and also lead to nearly optimal communication for KPCA in the constant approximation region. Experiments on large scale real world datasets show the efficacy of our algorithm.
Year
Venue
Field
2015
arXiv: Learning
Kernel (linear algebra),Data point,Feature vector,Mathematical optimization,Subspace topology,Kernel principal component analysis,Low-rank approximation,Artificial intelligence,Kernel method,Machine learning,Principal component analysis,Mathematics
DocType
Volume
Citations 
Journal
abs/1503.06858
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Maria-Florina Balcan11445105.01
Yingyu Liang239331.39
Le Song32437159.27
David P. Woodruff42156142.38
Bo Xie 00025945.19