Title
Feature Selection For High-Dimensional Data Through Instance Vote Combining
Abstract
Supervised feature selection (FS) is used to select a discriminative and non-redundant subset of features in classification problems dealing with high dimensional inputs. In this paper, feature selection is posed akin to the set-covering problem where the goal is to select a subset of features such that they cover the instances. To solve this formulation, we quantify the local relevance (i.e., votes assigned by instances) of each feature that captures the extent to which a given feature is useful to classify the individual instances correctly. In this work, we propose to combine the instance votes across features to infer their joint local relevance. The votes are combined on the basis of geometric principles underlying classification and feature spaces. Further, we show how such instance vote combining may be employed to derive a heuristic search strategy for selecting a relevant and non-redundant subset of features. We illustrate the effectiveness of our approach by evaluating the classification performance and robustness to data variations on publicly available benchmark datasets. We observed that the proposed method outperforms state-of-the-art mutual information based FS techniques and performs comparably to other heuristic approaches that solve the set-covering formulation of feature selection.
Year
DOI
Venue
2020
10.1145/3371158.3371177
PROCEEDINGS OF THE 7TH ACM IKDD CODS AND 25TH COMAD (CODS-COMAD 2020)
Keywords
Field
DocType
Feature selection, Filter-based method, Set-covering problem, Instance voting, Graph modularity, Vote combining
Clustering high-dimensional data,Feature selection,Pattern recognition,Computer science,Artificial intelligence
Conference
Citations 
PageRank 
References 
0
0.34
0
Authors
2
Name
Order
Citations
PageRank
Lily Chamakura100.34
Goutam Saha255.15