Abstract | ||
---|---|---|
Gene-expression microarray datasets often consist of a limited number of samples with a large number of gene-expression measurements, usually on the order of thousands. Therefore, dimensionality reduction is critical prior to any classification task. In this work, the iterative feature perturbation method (IFP), an embedded gene selector, is introduced and applied to four microarray cancer datasets: colon cancer, leukemia, Moffitt colon cancer, and lung cancer. We compare results obtained by IFP to those of support vector machine-recursive feature elimination (SVM-RFE) and the t-test as a feature filter using a linear support vector machine as the base classifier. Analysis of the intersection of gene sets selected by the three methods across the four datasets was done. Additional experiments included an initial pre-selection of the top 200 genes based on their p values. IFP and SVM-RFE were then applied on the reduced feature sets. These results showed up to 3.32% average performance improvement for IFP across the four datasets. A statistical analysis (using the Friedman/Holm test) for both scenarios showed the highest accuracies came from the t-test as a filter on experiments without gene pre-selection. IFP and SVM-RFE had greater classification accuracy after gene pre-selection. Analysis showed the t-test is a good gene selector for microarray data. IFP and SVM-RFE showed performance improvement on a reduced by t-test dataset. The IFP approach resulted in comparable or superior average class accuracy when compared to SVM-RFE on three of the four datasets. The same or similar accuracies can be obtained with different sets of genes. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1142/S0218001412600038 | INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE |
Keywords | Field | DocType |
Feature selection, microarray analysis, gene selection, t-test, feature perturbation | Data mining,Dimensionality reduction,Gene,Feature selection,Microarray analysis techniques,Artificial intelligence,Classifier (linguistics),Pattern recognition,Support vector machine,Machine learning,Mathematics,Performance improvement,Statistical analysis | Journal |
Volume | Issue | ISSN |
26 | 5 | 0218-0014 |
Citations | PageRank | References |
4 | 0.40 | 15 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Juana Canul-Reich | 1 | 9 | 3.60 |
Lawrence O. Hall | 2 | 5543 | 335.87 |
Dmitry B. Goldgof | 3 | 2021 | 198.90 |
John N. Korecki | 4 | 7 | 1.13 |
Steven Eschrich | 5 | 89 | 10.81 |