Abstract | ||
---|---|---|
The properties of bootstrap ensembles, such as bagging or random forest, are utilized to detect and handle label noise in classification problems. The first observation is that subsampling is a regularization mechanism that can be used to render bootstrap ensembles more robust to this type of noise. Furthermore, appropriate values of the sampling rate can be estimated using out-of-bag data. A second observation is that the ensemble classifiers tend to make more errors in incorrectly labeled instances. Thus, instances for which a sufficiently large fraction of ensemble predictors err are marked as noisy. Suitable values of this threshold, which are problem dependent, are determined by cross-validation using a wrapper method. Instances identified as noisy can then be either filtered (i.e. discarded for training), or cleaned by correcting their class labels. Finally, an ensemble is built afresh on these cleansed training data. Extensive experiments in classification problems from different areas of application show that this procedure is effective to build accurate ensembles, even in the presence of high levels of class-label noise. (C) 2017 Elsevier B.V. All rights reserved. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1016/j.neucom.2017.11.012 | NEUROCOMPUTING |
Keywords | Field | DocType |
Noise detection,Ensemble learning,Subsampling,Robust classification,Random forest | Training set,Pattern recognition,Computer science,Sampling (signal processing),Regularization (mathematics),Artificial intelligence,Noise detection,Random forest,Ensemble learning,Machine learning,Bootstrapping (electronics) | Journal |
Volume | ISSN | Citations |
275 | 0925-2312 | 5 |
PageRank | References | Authors |
0.42 | 13 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Maryam Sabzevari | 1 | 10 | 2.57 |
Gonzalo Martínez-Muñoz | 2 | 524 | 23.76 |
Alberto Suárez | 3 | 487 | 22.33 |