Empirical analysis and evaluation of approximate techniques for pruning regression bagging ensembles - Citegraph

Paper Info

Title
Empirical analysis and evaluation of approximate techniques for pruning regression bagging ensembles

Abstract
Identifying the optimal subset of regressors in a regression bagging ensemble is a difficult task that has exponential cost in the size of the ensemble. In this article we analyze two approximate techniques especially devised to address this problem. The first strategy constructs a relaxed version of the problem that can be solved using semidefinite programming. The second one is based on modifying the order of aggregation of the regressors. Ordered aggregation is a simple forward selection algorithm that incorporates at each step the regressor that reduces the training error of the current subensemble the most. Both techniques can be used to identify subensembles that are close to the optimal ones, which can be obtained by exhaustive search at a larger computational cost. Experiments in a wide variety of synthetic and real-world regression problems show that pruned ensembles composed of only 20% of the initial regressors often have better generalization performance than the original bagging ensembles. These improvements are due to a reduction in the bias and the covariance components of the generalization error. Subensembles obtained using either SDP or ordered aggregation generally outperform subensembles obtained by other ensemble pruning methods and ensembles generated by the Adaboost.R2 algorithm, negative correlation learning or regularized linear stacked generalization. Ordered aggregation has a slightly better overall performance than SDP in the problems investigated. However, the difference is not statistically significant. Ordered aggregation has the further advantage that it produces a nested sequence of near-optimal subensembles of increasing size with no additional computational cost.

Year	DOI	Venue
2011	10.1016/j.neucom.2011.03.001	Neurocomputing
Keywords	DocType	Volume
bagging,empirical analysis,semidefinite programming,ensemble learning,generalization performance,ordered aggregation,ensemble pruning method,exponential cost,pruning regression,generalization error,boosting,additional computational cost,regression,larger computational cost,initial regressors,near-optimal subensembles,approximate technique,ensemble pruning,original bagging ensemble,statistical significance,exhaustive search	Journal	74
Issue	ISSN	Citations
12-13	Neurocomputing	18
PageRank	References	Authors
0.73	31	3

Authors (3 rows)

Cited by (18 rows)

References (31 rows)

Name	Order	Citations	PageRank
Daniel Hernández-Lobato	1	440	26.10
Gonzalo Martínez-Muñoz	2	524	23.76
Alberto Suárez	3	67	6.28

1