Title
Two-Way Kernel Matrix Puncturing: Towards Resource-Efficient Pca And Spectral Clustering
Abstract
The article introduces an elementary cost and storage reduction method for spectral clustering and principal component analysis. The method consists in randomly "puncturing" both the data matrix X is an element of C-pxn (or R-pxn) and its corresponding kernel (Gram) matrix K through Bernoulli masks: S' is an element of {0, 1}(pxn) for X and B is an element of {0, 1}(nxn) for K. The resulting "two-way punctured" kernel is thus given by K = 1/p[(X circle dot S)(H) (X circle dot S)] circle dot B. We demonstrate that, for X composed of independent columns drawn from a Gaussian mixture model, as n, p -> infinity with p/n -> c(0) is an element of (0, infinity), the spectral behavior of K - its limiting eigenvalue distribution, as well as its isolated eigenvalues and eigenvectors - is fully tractable and exhibits a series of counter-intuitive phenomena. We notably prove, and empirically confirm on various real image databases, that it is possible to drastically puncture the data, thereby providing possibly huge computational and storage gains, for a virtually constant (clustering or PCA) performance. This preliminary study opens as such the path towards rethinking, from a large dimensional standpoint, computational and storage costs in elementary machine learning models.
Year
Venue
DocType
2021
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139
Conference
Volume
ISSN
Citations 
139
2640-3498
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Romain Couillet169274.03
Florent Chatelain272.66
Nicolas Le Bihan325423.35