Title
Demoting redundant features to improve the discriminatory ability in cancer data.
Abstract
The identification of a set of relevant but not redundant features is an important first step in building predictive and diagnostic models from biomedical data sets. Most commonly, individual features are ranked in terms of a quality criterion, out of which the best (first) k features are selected. However, feature ranking methods do not sufficiently account for interactions and correlations between the features. Thus, redundancy is likely to be encountered in the selected features. We present a new algorithm, termed Redundancy Demoting (RD), that takes an arbitrary feature ranking as input, and improves this ranking by identifying redundant features and demoting them to positions in the ranking in which they are not redundant. Redundant features are those that are correlated with other features and not relevant in the sense that they do not improve the discriminatory ability of a set of features. Experiments on two cancer data sets, one melanoma image data set and one lung cancer microarray data set, show that our algorithm greatly improves the feature rankings provided by the methods information gain, ReliefF and Student's t-test in terms of predictive power.
Year
DOI
Venue
2009
10.1016/j.jbi.2009.05.006
Journal of Biomedical Informatics
Keywords
Field
DocType
lung cancer,lung cancer microarray data,redundant feature,cancer data set,melanoma,biomedical data set,arbitrary feature ranking,melanoma image data,feature ranking,individual feature,correlation,ranking method,redundancy,knowledge discovery,data mining,discriminatory ability,feature selection,k feature,information gain,microarray data
Data mining,Data set,Predictive power,Ranking,Pattern recognition,Feature selection,Computer science,Feature ranking,Correlation,Redundancy (engineering),Artificial intelligence,Knowledge extraction
Journal
Volume
Issue
ISSN
42
4
1532-0480
Citations 
PageRank 
References 
5
0.52
11
Authors
6
Name
Order
Citations
PageRank
Melanie Osl1716.83
Stephan Dreiseitl233834.80
F Cerqueira350.52
M Netzer4181.30
Bernhard Pfeifer54710.17
Christian Baumgartner610014.03