Title | ||
---|---|---|
Perturbation and candidate analysis to combat overfitting of gene expression microarray data. |
Abstract | ||
---|---|---|
Analysis of gene expression microarray datasets presents the high risk of over-fitting (spurious patterns) because of their feature-rich but case-poor nature. This paper describes our ongoing efforts to develop a method to combat over-fitting and determine the strongest signal in the dataset. A GA-SVM hybrid along with Gaussian noise (manual noise gain) is used to discover feature sets of minimal size that accurately classifies the cases under cross-validation. Initial results on a colorectal cancer dataset shows that the strongest signal (modest number of candidates) can be found by a binary search. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1504/IJCBDD.2011.044443 | Int. J. Comput. Biol. Drug Des. |
Keywords | Field | DocType |
overfitting,svm,support vector machines,genetic algorithms,cross validation,colorectal cancer,dna microarray,roc curve,gas | Data mining,Biology,Microarray analysis techniques,Artificial intelligence,Binary search algorithm,Overfitting,Support vector machine,Bioinformatics,Cross-validation,Gaussian noise,Spurious relationship,Machine learning,DNA microarray | Journal |
Volume | Issue | ISSN |
4 | 4 | 1756-0756 |
Citations | PageRank | References |
1 | 0.43 | 0 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ravi Mathur | 1 | 7 | 1.94 |
J. David Schaffer | 2 | 2003 | 1593.96 |
Walker H. Land Jr. | 3 | 50 | 11.04 |
John J. Heine | 4 | 29 | 6.07 |
Jonathan M Hernandez | 5 | 1 | 0.43 |
Timothy Yeatman | 6 | 7 | 1.60 |