Abstract | ||
---|---|---|
Gene selection is an important step in analysis of gene data sets in which the number of genes exceeds greatly the number of samples. In this paper, we propose a new method that uses a random forest model to select genes from high dimensional gene data sets. In this method, Breiman's random forest algorithm is first used to generate a random forest model from a high dimensional data set. Then, features appearing in component tree models of the random forest are analyzed using the measures of feature correlations. Features are divided into two sets, those appearing in the roots of component trees and those appearing in other nodes of the trees. The frequency of the features is calculated and the features whose frequency is greater than given thresholds are selected as candidates. Finally, the correlation of candidate features with the class feature is measured with symmetrical uncertainty and the top features (with the highest symmetrical uncertainty values) are selected. 19 gene data sets were used to evaluate the new gene selection method. The comparison results have shown that the models built with the gene features selected with the new method outperformed other random forest models in classification accuracy. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1109/RIVF.2016.7800293 | 2016 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF) |
Keywords | Field | DocType |
frequency-based gene selection method,random forest model,high dimensional gene data analysis,Breiman random forest algorithm,component tree models,feature correlations,symmetrical uncertainty,feature selection | Data modeling,Data set,Clustering high-dimensional data,Gene,Pattern recognition,Measurement uncertainty,Feature extraction,Correlation,Artificial intelligence,Random forest,Mathematics | Conference |
ISBN | Citations | PageRank |
978-1-5090-4135-0 | 0 | 0.34 |
References | Authors | |
8 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Thanh Trinh | 1 | 0 | 0.34 |
dingming wu | 2 | 28 | 5.47 |
Salman Salloum | 3 | 2 | 0.71 |
Tung T Nguyen | 4 | 57 | 8.29 |
Joshua Zhexue Huang | 5 | 1365 | 82.64 |