Title
A New Approach for Feature Selection from Microarray Data Based on Mutual Information.
Abstract
Mutual information (MI) is a powerful concept for correlation-centric applications. It has been used for feature selection from microarray gene expression data in many works. One of the merits of MI is that, unlike many other heuristic methods, it is based on a mature theoretic foundation. When applied to microarray data, however, it faces some challenges. First, due to the large number of features (i.e., genes) present in microarray data, the true distributions for the expression values of some genes may be distorted by noise. Second, evaluating inter-group mutual information requires estimating multi-variate distributions, which is quite difficult if not impossible. To address these problems, in this paper we propose a new MI-based feature selection approach for microarray data. Our approach relies on two strategies: one is relevance boosting, which requires a desirable feature to show substantially additional relevance with class labeling beyond the already selected features, the other is feature interaction enhancing, which probabilistically compensates for feature interaction missing from simple aggregationbased evaluation. We justify our approach from both theoretical perspective and experimental results. We use a synthetic dataset to show the statistical significance of the proposed strategies, and real-life datasets to show the improved performance of our approach over the existing methods.
Year
DOI
Venue
2016
10.1109/TCBB.2016.2515582
IEEE/ACM Trans. Comput. Biology Bioinform.
Keywords
Field
DocType
Mutual information,Redundancy,Labeling,Approximation methods,Gene expression,Noise measurement,Data analysis
Data mining,Noise measurement,Feature selection,Computer science,Redundancy (engineering),Microarray analysis techniques,Artificial intelligence,Heuristic,Mutual information,Boosting (machine learning),Bioinformatics,Machine learning,Gene expression profiling
Journal
Volume
Issue
ISSN
13
6
1557-9964
Citations 
PageRank 
References 
4
0.40
24
Authors
2
Name
Order
Citations
PageRank
Jian Tang1526148.30
Shuigeng Zhou22089207.00