Title
Non-negative Tensor Factorization with missing data for the modeling of gene expressions in the Human Brain
Abstract
Non-negative Tensor Factorization (NTF) has become a prominent tool for analyzing high dimensional multi-way structured data. In this paper we set out to analyze gene expression across brain regions in multiple subjects based on data from the Allen Human Brain Atlas [1] with more than 40 % data missing in our problem. Our analysis is based on the non-negativity constrained Canonical Polyadic (CP) decomposition where we handle the missing data using marginalization considering three prominent alternating least squares procedures; multiplicative updates, column-wise, and row-wise updating of the component matrices. We examine three gene expression prediction scenarios based on data missing at random, whole genes missing and whole areas missing within a subject. We find that the column-wise updating approach also known as HALS performs the most efficient when fitting the model. We further observe that the non-negativity constrained CP model is able to predict gene expressions better than predicting by the subject average when data is missing at random. When whole genes and whole areas are missing it is in general better to predict by subject averages. However, we find that when whole genes are missing from all subjects the model based predictions are useful. When analyzing the structure of the components derived for one of the best predicting model orders the components identified in general constitute localized regions of the brain. Non-negative tensor factorization based on marginalization thus forms a promising framework for imputing missing values and characterizing gene expression in the human brain. However, care also has to be taken in particular when predicting the genetic expression levels at a whole region of the brain missing as our analysis indicates that this requires a substantial amount of subjects with data for this region in order for the model predictions to be reliable.
Year
DOI
Venue
2014
10.1109/MLSP.2014.6958919
Machine Learning for Signal Processing
Keywords
DocType
ISSN
biology computing,data analysis,genetics,least mean squares methods,matrix algebra,Allen human brain atlas,CP,HALS,alternating least squares procedures,column-wise component matrix updating,gene expressions modeling,high dimensional multiway structured data analysis,human brain,missing data,multiplicative updates,nonnegative tensor factorization,nonnegativity constrained canonical polyadic decomposition,row-wise component matrix updating,CP,Cande-Comp/PARAFAC,Marginalization,Missing Values,Non-negative Matrix Factorization,Non-negative Tensor Factorization
Conference
2161-0363
Citations 
PageRank 
References 
1
0.39
0
Authors
2
Name
Order
Citations
PageRank
Soren Fons Vind Nielsen141.84
Morten Mørup270451.29