Title
Semi-supervised imputation for microarray missing value estimation
Abstract
Data missing is a kind of inevitable phenomenon in gene expression microarray experiments due to many factors. The integrity of the data plays a key role in the performance of the downstream analysis. Therefore, many developments have been achieved in the research on estimating missing values. However, when it comes to missing data with a large missing rate, most current estimation methods cannot obtain a high estimation precision. In this paper, induced by the thought of semi-supervised learning with collaborative training, we propose a new imputation method called COIM (COllaborative IMputation). COIM estimates missing values using collaborative imputation strategy based on Bayesian principal component analysis (BPCA) and local least squares (LLS). It exploits global correlation information and local structure in the missing dataset, by sharing the estimated results with each other between BPCA and LLS. Furthermore, COIM uses tactics of recovering genes that have less missing entries first. Numerical results demonstrate that COIM is superior to the comparative algorithms in terms of normalized root mean square error (NRMSE), especially for the datasets with large missing rates or less complete genes.
Year
DOI
Venue
2014
10.1109/BIBM.2014.6999172
BIBM
Keywords
Field
DocType
collaborative imputation strategy,missing value imputation,microarray missing value estimation,semisupervised learning,large missing rate,data missing,bayes methods,normalized root mean square error,learning (artificial intelligence),genetics,downstream analysis,gene recovery,data analysis,semi-supervised learning,bayesian principal component analysis,coim,high-estimation precision,gene expression microarray experiments,least mean squares methods,genetic algorithms,microarray gene expression data,global correlation information,local least squares,collaborative training,bioinformatics,principal component analysis,data integrity,bpca,semisupervised imputation,correlation,estimation,collaboration,gene expression
Least squares,Data mining,Computer science,Bayesian principal component analysis,Local structure,Normalized root mean square error,Correlation,Artificial intelligence,Imputation (statistics),Missing data,Machine learning
Conference
ISSN
Citations 
PageRank 
2156-1125
1
0.36
References 
Authors
12
3
Name
Order
Citations
PageRank
Hui-Hui Li120.71
Feng-Feng Shao210.70
Guo-Zheng Li3232.12