Title
Recovering key biological constituents through sparse representation of gene expression.
Abstract
Large-scale RNA expression measurements are generating enormous quantities of data. During the last two decades, many methods were developed for extracting insights regarding the interrelationships between genes from such data. The mathematical and computational perspectives that underlie these methods are usually algebraic or probabilistic.Here, we introduce an unexplored geometric view point where expression levels of genes in multiple experiments are interpreted as vectors in a high-dimensional space. Specifically, we find, for the expression profile of each particular gene, its approximation as a linear combination of profiles of a few other genes. This method is inspired by recent developments in the realm of compressed sensing in the machine learning domain. To demonstrate the power of our approach in extracting valuable information from the expression data, we independently applied it to large-scale experiments carried out on the yeast and malaria parasite whole transcriptomes. The parameters extracted from the sparse reconstruction of the expression profiles, when fed to a supervised learning platform, were used to successfully predict the relationships between genes throughout the Gene Ontology hierarchy and protein-protein interaction map. Extensive assessment of the biological results shows high accuracy in both recovering known predictions and in yielding accurate predictions missing from the current databases. We suggest that the geometrical approach presented here is suitable for a broad range of high-dimensional experimental data.
Year
DOI
Venue
2011
10.1093/bioinformatics/btr002
Bioinformatics
Keywords
Field
DocType
supplementary data,expression level,large-scale rna expression measurement,gene expression,il supplementary information,protein interaction map,geometrical approach,high-dimensional space,high-dimensional experimental data,key biological constituent,expression data,expression profile,sparse representation
Data mining,Linear combination,Experimental data,Computer science,Gene ontology,Artificial intelligence,Probabilistic logic,Compressed sensing,Sparse approximation,Gene expression,Supervised learning,Bioinformatics,Machine learning
Journal
Volume
Issue
ISSN
27
5
1367-4811
Citations 
PageRank 
References 
2
0.40
10
Authors
4
Name
Order
Citations
PageRank
Yosef Prat120.74
Menachem Fromer214310.47
Nati Linial33872602.77
Michal Linial41502149.92