Title
Performance-Complexity Trade-Off in Large Dimensional Statistics
Abstract
This article introduces a random matrix framework for the analysis of the trade-off between performance and complexity in a class of machine learning algorithms, under a large dimensional data X = [x <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> , ..., x <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</sub> ] ∈ R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p×n</sup> regime. Specifically, we analyze the spectral properties of K ⊙ B ∈ R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n×n</sup> , for the kernel random matrix K=-[1/p]X <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">T</sup> X upon which a sparsity mask B ∈ {0,1} <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n×n</sup> is applied: this reduces the number of K <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ij</sub> to evaluate, thereby reducing complexity, while weakening the power of statistical inference on K, thereby impeding performance. Assuming the data structured as X = Z + √(nμv) <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">T</sup> for informative vectors μ ∈ R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p</sup> , v ∈ R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</sup> , and white noise Z, we exhibit a phase transition phenomenon below which spectral methods must fail and which is a function of the sparsity structure of B. This finds immediate applications to the fundamental limits of complexity-reduced spectral clustering as well as principal component analysis.
Year
DOI
Venue
2020
10.1109/MLSP49062.2020.9231568
2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP)
Keywords
DocType
ISSN
Random matrix theory,large dimensional statistics,spectral clustering,PCA
Conference
1551-2541
ISBN
Citations 
PageRank 
978-1-7281-6663-6
0
0.34
References 
Authors
1
4
Name
Order
Citations
PageRank
Tayeb Zarrouk100.34
Romain Couillet269274.03
Florent Chatelain300.34
Nicolas Le Bihan425423.35