Title
Differential prioritization between relevance and redundancy in correlation-based feature selection techniques for multiclass gene expression data.
Abstract
BACKGROUND: Due to the large number of genes in a typical microarray dataset, feature selection looks set to play an important role in reducing noise and computational cost in gene expression-based tissue classification while improving accuracy at the same time. Surprisingly, this does not appear to be the case for all multiclass microarray datasets. The reason is that many feature selection techniques applied on microarray datasets are either rank-based and hence do not take into account correlations between genes, or are wrapper-based, which require high computational cost, and often yield difficult-to-reproduce results. In studies where correlations between genes are considered, attempts to establish the merit of the proposed techniques are hampered by evaluation procedures which are less than meticulous, resulting in overly optimistic estimates of accuracy. RESULTS: We present two realistically evaluated correlation-based feature selection techniques which incorporate, in addition to the two existing criteria involved in forming a predictor set (relevance and redundancy), a third criterion called the degree of differential prioritization (DDP). DDP functions as a parameter to strike the balance between relevance and redundancy, providing our techniques with the novel ability to differentially prioritize the optimization of relevance against redundancy (and vice versa). This ability proves useful in producing optimal classification accuracy while using reasonably small predictor set sizes for nine well-known multiclass microarray datasets. CONCLUSION: For multiclass microarray datasets, especially the GCM and NCI60 datasets, DDP enables our filter-based techniques to produce accuracies better than those reported in previous studies which employed similarly realistic evaluation procedures.
Year
DOI
Venue
2006
10.1186/1471-2105-7-320
BMC Bioinformatics
Keywords
Field
DocType
feature selection,gene expression,computational biology,microarrays,bioinformatics,gene expression profiling,algorithms
Data mining,Feature selection,Computer science,Prioritization,Correlation,Redundancy (engineering),Bioinformatics,DNA microarray,Gene expression profiling
Journal
Volume
Issue
ISSN
7
1
1471-2105
Citations 
PageRank 
References 
39
0.86
13
Authors
3
Name
Order
Citations
PageRank
Chia Huey Ooi1584.25
Madhu Chetty236939.17
Shyh Wei Teng315121.02