Title
A frequency-based gene selection method with random forests for gene data analysis
Abstract
Gene selection is an important step in analysis of gene data sets in which the number of genes exceeds greatly the number of samples. In this paper, we propose a new method that uses a random forest model to select genes from high dimensional gene data sets. In this method, Breiman's random forest algorithm is first used to generate a random forest model from a high dimensional data set. Then, features appearing in component tree models of the random forest are analyzed using the measures of feature correlations. Features are divided into two sets, those appearing in the roots of component trees and those appearing in other nodes of the trees. The frequency of the features is calculated and the features whose frequency is greater than given thresholds are selected as candidates. Finally, the correlation of candidate features with the class feature is measured with symmetrical uncertainty and the top features (with the highest symmetrical uncertainty values) are selected. 19 gene data sets were used to evaluate the new gene selection method. The comparison results have shown that the models built with the gene features selected with the new method outperformed other random forest models in classification accuracy.
Year
DOI
Venue
2016
10.1109/RIVF.2016.7800293
2016 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF)
Keywords
Field
DocType
frequency-based gene selection method,random forest model,high dimensional gene data analysis,Breiman random forest algorithm,component tree models,feature correlations,symmetrical uncertainty,feature selection
Data modeling,Data set,Clustering high-dimensional data,Gene,Pattern recognition,Measurement uncertainty,Feature extraction,Correlation,Artificial intelligence,Random forest,Mathematics
Conference
ISBN
Citations 
PageRank 
978-1-5090-4135-0
0
0.34
References 
Authors
8
5
Name
Order
Citations
PageRank
Thanh Trinh100.34
dingming wu2285.47
Salman Salloum320.71
Tung T Nguyen4578.29
Joshua Zhexue Huang5136582.64