Abstract | ||
---|---|---|
This paper describes the R package VSURF. Based on random forests, and for both regression and classification problems, it returns two subsets of variables. The first is a subset of important variables including some redundancy which can be relevant for interpretation, and the second one is a smaller subset corresponding to a model trying to avoid redundancy focusing more closely on the prediction objective. The two-stage strategy is based on a preliminary ranking of the explanatory variables using the random forests permutation-based score of importance and proceeds using a stepwise forward strategy for variable introduction. The two proposals can be obtained automatically using data-driven default values, good enough to provide interesting results, but strategy can also be tuned by the user. The algorithm is illustrated on a simulated example and its applications to real datasets are presented. |
Year | DOI | Venue |
---|---|---|
2015 | 10.32614/rj-2015-018 | R JOURNAL |
Field | DocType | Volume |
Data mining,Feature selection,Ranking,Regression,Computer science,Permutation,Redundancy (engineering),Artificial intelligence,Statistics,Random forest,Machine learning,R package | Journal | 7 |
Issue | ISSN | Citations |
2 | 2073-4859 | 0 |
PageRank | References | Authors |
0.34 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Robin Genuer | 1 | 4 | 2.14 |
Jean-Michel Poggi | 2 | 174 | 16.19 |
Christine Tuleau-Malot | 3 | 87 | 5.23 |