Title
Stochastic Canonical Correlation Analysis
Abstract
We study the sample complexity of canonical correlation analysis (CCA), i.e., the number of samples needed to estimate the population canonical correlation and directions up to arbitrarily small error. With mild assumptions on the data distribution, we show that in order to achieve epsilon-suboptimality in a properly defined measure of alignment between the estimated canonical directions and the population solution, we can solve the empirical objective exactly with N(epsilon, Delta, gamma) samples, where Delta is the singular value gap of the whitened cross-covariance matrix and 1/gamma is an upper bound of the condition number of auto-covariance matrices. Moreover, we can achieve the same learning accuracy by drawing the same level of samples and solving the empirical objective approximately with a stochastic optimization algorithm; this algorithm is based on the shift-and-invert power iterations and only needs to process the dataset for O (log 1/c) passes. Finally, we show that, given an estimate of the canonical correlation, the streaming version of the shift-and-invert power iterations achieves the same learning accuracy with the same level of sample complexity, by processing the data only once.
Year
Venue
Keywords
2017
JOURNAL OF MACHINE LEARNING RESEARCH
Canonical correlation analysis,sample complexity,shift-and-invert preconditioning,streaming CCA
Field
DocType
Volume
Population,Mathematical optimization,Condition number,Singular value,Matrix (mathematics),Stochastic optimization algorithm,Upper and lower bounds,Canonical correlation,Artificial intelligence,Sample complexity,Machine learning,Mathematics
Journal
20
Issue
ISSN
Citations 
167
1532-4435
1
PageRank 
References 
Authors
0.37
8
5
Name
Order
Citations
PageRank
Chao Gao110.37
Dan Garber211411.58
Nathan Srebro33892349.42
Jialei Wang47710.29
Weiran Wang51149.99