Title
A two-stage ensemble method for the detection of class-label noise.
Abstract
The properties of bootstrap ensembles, such as bagging or random forest, are utilized to detect and handle label noise in classification problems. The first observation is that subsampling is a regularization mechanism that can be used to render bootstrap ensembles more robust to this type of noise. Furthermore, appropriate values of the sampling rate can be estimated using out-of-bag data. A second observation is that the ensemble classifiers tend to make more errors in incorrectly labeled instances. Thus, instances for which a sufficiently large fraction of ensemble predictors err are marked as noisy. Suitable values of this threshold, which are problem dependent, are determined by cross-validation using a wrapper method. Instances identified as noisy can then be either filtered (i.e. discarded for training), or cleaned by correcting their class labels. Finally, an ensemble is built afresh on these cleansed training data. Extensive experiments in classification problems from different areas of application show that this procedure is effective to build accurate ensembles, even in the presence of high levels of class-label noise. (C) 2017 Elsevier B.V. All rights reserved.
Year
DOI
Venue
2018
10.1016/j.neucom.2017.11.012
NEUROCOMPUTING
Keywords
Field
DocType
Noise detection,Ensemble learning,Subsampling,Robust classification,Random forest
Training set,Pattern recognition,Computer science,Sampling (signal processing),Regularization (mathematics),Artificial intelligence,Noise detection,Random forest,Ensemble learning,Machine learning,Bootstrapping (electronics)
Journal
Volume
ISSN
Citations 
275
0925-2312
5
PageRank 
References 
Authors
0.42
13
3
Name
Order
Citations
PageRank
Maryam Sabzevari1102.57
Gonzalo Martínez-Muñoz252423.76
Alberto Suárez348722.33