Title
Combining Clustering Of Variables And Feature Selection Using Random Forests
Abstract
Standard approaches to tackle high-dimensional supervised classification often include variable selection and dimension reduction. The proposed methodology combines clustering of variables and feature selection. Hierarchical clustering of variables allows to built groups of correlated variables and summarizes each group by a synthetic variable. Originality is that groups of variables are unknown a priori. Moreover clustering approach deals with both numerical and categorical variables. Among all the possible partitions, the most relevant synthetic variables are selected with a procedure using random forests. Numerical performances are illustrated on simulated and real datasets. Selection of groups of variables provides easier interpretation of results.
Year
DOI
Venue
2021
10.1080/03610918.2018.1563145
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION
Keywords
DocType
Volume
clustering of variables, random forests, supervised classification, variable selection
Journal
50
Issue
ISSN
Citations 
2
0361-0918
0
PageRank 
References 
Authors
0.34
8
6
Name
Order
Citations
PageRank
Marie Chavent120216.79
Marie Chavent220216.79
Robin Genuer342.14
Robin Genuer442.14
Jérôme Saracco522.51
Jérôme Saracco622.51