Title
Evolutionary computation with noise perturbation and cluster analysis to discover biomarker sets.
Abstract
In biomedical science, data mining techniques have been applied to extract statistically significant and clinically useful information from a given dataset. Finding biomarker gene sets for diseases can aid in understanding disease diagnosis, prognosis and therapy response. Gene expression microarrays have played an important role in such studies and yet, there have also been criticisms in their analysis. Analysis of these datasets presents the high risk of over-fitting (discovering spurious patterns) because of their feature-rich but case-poor nature. This paper describes a GA-SVM hybrid along with Gaussian noise perturbation (with a manual noise gain) to combat over-fitting; determine the strongest signal in the dataset; and discover stable biomarker sets. A colon cancer gene expression microarray dataset is used to show that the strongest signal in the data (optimal noise gain where a modest number of similar candidates emerge) can be found by a binary search. The diversity of candidates (measured by cluster analysis) is reduced by the noise perturbation, indicating some of the patterns are being eliminated (we hope mostly spurious ones). Initial biological validated has been tested and genes have different levels of significance to the candidates; although the discovered biomarker sets should be studied further to ascertain their biological significance and clinical utility. Furthermore, statistical validity displays that the strongest signal in the data is spurious and the discovered biomarker sets should be rejected.
Year
DOI
Venue
2011
10.1016/j.procs.2011.08.030
Procedia Computer Science
Keywords
Field
DocType
Az Value,Colon Cancer,Gene Expression Microarrays,Genetic Algorithm,Hierarchical Clustering,Over-Fitting,Support Vector Machines
Hierarchical clustering,Data mining,Computer science,Support vector machine,Evolutionary computation,Biomarker (medicine),Artificial intelligence,Overfitting,Spurious relationship,Gaussian noise,DNA microarray,Machine learning
Journal
Volume
ISSN
Citations 
6
1877-0509
2
PageRank 
References 
Authors
0.60
6
6
Name
Order
Citations
PageRank
Ravi Mathur171.94
J. David Schaffer220031593.96
Walker H. Land Jr.35011.04
John J. Heine4296.07
Steven Eschrich58910.81
Timothy Yeatman671.60