Title
ApproxCCA: An approximate correlation analysis algorithm for multidimensional data streams
Abstract
Correlation analysis is regarded as a significant challenge in the mining of multidimensional data streams. Great emphasis is generally placed on one-dimensional data streams with the existing correlation analysis methods for the mining of data streams. Therefore, the identification of underlying correlation among multivariate arrays (e.g. Sensor data) has long been ignored. The technique of canonical correlation analysis (CCA) has rarely been applied in multidimensional data streams. In this study, a novel correlation analysis algorithm based on CCA, called ApproxCCA, is proposed to explore the correlations between two multidimensional data streams in the environment with limited resources. By introducing techniques of unequal probability sampling and low-rank approximation to reduce the dimensionality of the product matrix composed by the sample covariance matrix and sample variance matrix, ApproxCCA successfully improves computational efficiency while ensuring the analytical precision. Experimental results of synthetic and real data sets have indicated that the computational bottleneck of traditional CCA can be overcome with ApproxCCA, and the correlations between two multidimensional data streams can also be detected accurately.
Year
DOI
Venue
2011
10.1016/j.knosys.2011.04.003
Knowledge-Based Systems
Keywords
Field
DocType
Multidimensional data streams,Canonical correlation analysis,Probability and statistics,Approximation,Unequal probability sampling
Data mining,Data stream mining,Data set,Computer science,Matrix (mathematics),Canonical correlation,Artificial intelligence,Probability and statistics,Pattern recognition,Algorithm,Curse of dimensionality,Sampling (statistics),Covariance matrix
Journal
Volume
Issue
ISSN
24
7
0950-7051
Citations 
PageRank 
References 
4
0.39
13
Authors
3
Name
Order
Citations
PageRank
Yong-li Wang110726.46
Gongxuan Zhang29419.89
Jiang-bo Qian32810.49