Abstract | ||
---|---|---|
Motivation: Modern functional genomics generates high-dimensional datasets. It is often convenient to have a single simple number characterizing the relationship between pairs of such high-dimensional datasets in a comprehensive way. Matrix correlations are such numbers and are appealing since they can be interpreted in the same way as Pearson's correlations familiar to biologists. The high-dimensionality of functional genomics data is, however, problematic for existing matrix correlations. The motivation of this article is 2-fold: (i) we introduce the idea of matrix correlations to the bioinformatics community and (ii) we give an improvement of the most promising matrix correlation coefficient (the RV-coefficient) circumventing the problems of high-dimensional data. Results: The modified RV-coefficient can be used in high-dimensional data analysis studies as an easy measure of common information of two datasets. This is shown by theoretical arguments, simulations and applications to two real-life examples from functional genomics, i. e. a transcriptomics and metabolomics example. |
Year | DOI | Venue |
---|---|---|
2009 | 10.1093/bioinformatics/btn634 | BIOINFORMATICS |
Field | DocType | Volume |
Data mining,Correlation coefficient,Clustering high-dimensional data,MATLAB,Computer science,Matrix (mathematics),Functional genomics,Genomics,Correlation,Bioinformatics,RV coefficient | Journal | 25 |
Issue | ISSN | Citations |
3 | 1367-4803 | 5 |
PageRank | References | Authors |
1.26 | 1 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Age K Smilde | 1 | 176 | 16.49 |
Henk A. L. Kiers | 2 | 169 | 18.28 |
S. Bijlsma | 3 | 5 | 1.26 |
C. M. Rubingh | 4 | 5 | 1.26 |
M. J. Van Erk | 5 | 5 | 1.26 |