Title
An Efficient Method To Estimate Bagging‘s Generalization Error
Abstract
Bagging (Breiman, 1994a) is a technique that tries toimprove a learning algorithm‘s performance by using bootstrapreplicates of the training set (Efron & Tibshirani, 1993, Efron, 1979). The computational requirements for estimating the resultantgeneralization error on a test set by means of cross-validation areoften prohibitive, for leave-one-out cross-validation one needs totrain the underlying algorithm on the order of mν times, wherem is the size of the training set and ν is the number ofreplicates. This paper presents several techniques for estimatingthe generalization error of a bagged learning algorithm withoutinvoking yet more training of the underlying learning algorithm(beyond that of the bagging itself), as is required bycross-validation-based estimation. These techniques all exploit thebias-variance decomposition (Geman, Bienenstock & Doursat, 1992, Wolpert, 1996). The best of ourestimators also exploits stacking (Wolpert, 1992). In a set ofexperiments reported here, it was found to be more accurate than boththe alternative cross-validation-based estimator of the baggedalgorithm‘s error and the cross-validation-based estimator of theunderlying algorithm‘s error. This improvement was particularlypronounced for small test sets. This suggests a novel justificationfor using bagging—more accurate estimation of the generalizationerror than is possible without bagging.
Year
DOI
Venue
1999
10.1023/A:1007519102914
Machine Learning
Keywords
Field
DocType
Bagging,cross-validation,stacking,generalization error,bootstrap
Training set,Pattern recognition,Computer science,Generalization error,Artificial intelligence,Cross-validation,Bootstrapping (electronics),Machine learning,Estimator,Test set
Journal
Volume
Issue
ISSN
35
1
1573-0565
Citations 
PageRank 
References 
17
19.67
4
Authors
2
Name
Order
Citations
PageRank
David H. Wolpert14334591.07
William G. Macready216139.07