Title
Perturbation and candidate analysis to combat overfitting of gene expression microarray data.
Abstract
Analysis of gene expression microarray datasets presents the high risk of over-fitting (spurious patterns) because of their feature-rich but case-poor nature. This paper describes our ongoing efforts to develop a method to combat over-fitting and determine the strongest signal in the dataset. A GA-SVM hybrid along with Gaussian noise (manual noise gain) is used to discover feature sets of minimal size that accurately classifies the cases under cross-validation. Initial results on a colorectal cancer dataset shows that the strongest signal (modest number of candidates) can be found by a binary search.
Year
DOI
Venue
2011
10.1504/IJCBDD.2011.044443
Int. J. Comput. Biol. Drug Des.
Keywords
Field
DocType
overfitting,svm,support vector machines,genetic algorithms,cross validation,colorectal cancer,dna microarray,roc curve,gas
Data mining,Biology,Microarray analysis techniques,Artificial intelligence,Binary search algorithm,Overfitting,Support vector machine,Bioinformatics,Cross-validation,Gaussian noise,Spurious relationship,Machine learning,DNA microarray
Journal
Volume
Issue
ISSN
4
4
1756-0756
Citations 
PageRank 
References 
1
0.43
0
Authors
6
Name
Order
Citations
PageRank
Ravi Mathur171.94
J. David Schaffer220031593.96
Walker H. Land Jr.35011.04
John J. Heine4296.07
Jonathan M Hernandez510.43
Timothy Yeatman671.60