Inferring statistically significant features from random forests. - Citegraph

Paper Info

Title
Inferring statistically significant features from random forests.

Abstract
Embedded feature selection can be performed by analyzing the variables used in a Random Forest. Such a multivariate selection takes into account the interactions between variables but is not straightforward to interpret in a statistical sense. We propose a statistical procedure to measure variable importance that tests if variables are significantly useful in combination with others in a forest. We show experimentally that this new importance index correctly identifies relevant variables. The top of the variable ranking is largely correlated with Breiman׳s importance index based on a permutation test. Our measure has the additional benefit to produce p-values from the forest voting process. Such p-values offer a very natural way to decide which features are significantly relevant while controlling the false discovery rate. Practical experiments are conducted on synthetic and real data including low and high-dimensional datasets for binary or multi-class problems. Results show that the proposed technique is effective and outperforms recent alternatives by reducing the computational complexity of the selection process by an order of magnitude while keeping similar performances.

Year	DOI	Venue
2015	10.1016/j.neucom.2014.07.067	Neurocomputing
Keywords	Field	DocType
Feature selection,Tree ensembles,Significance tests,High-dimensional data analysis	Data mining,False discovery rate,Feature selection,Voting,Ranking,Multivariate statistics,Artificial intelligence,Random forest,Resampling,Mathematics,Machine learning,Computational complexity theory	Journal
Volume	ISSN	Citations
150	0925-2312	0
PageRank	References	Authors
0.34	7	2

Authors (2 rows)

Cited by (0 rows)

References (7 rows)

Name	Order	Citations	PageRank
Jérôme Paul	1	12	1.57
Pierre Dupont	2	380	29.30

1