Title
Novel mislabeled training data detection algorithm.
Abstract
As a kind of noise, mislabeled training data exist in many applications. Because of their negative effects on learning, many filter techniques have been proposed to identify and eliminate them. Ensemble learning-based filter (EnFilter) is the most widely used filter which employs ensemble classifiers. In EnFilter, first the noisy training dataset is divided into several subsets. Each noisy subset is then checked by the multiple classifiers which are trained based on other noisy subsets. It is noted that since the training data used to train multiple classifiers are noisy, the quality of these classifiers cannot be guaranteed, which might generate poor noise identification result. This problem is more serious when the noise ratio in the training dataset is high. To solve this problem, a straightforward but effective approach is proposed in this work. Instead of using noisy data to train the classifiers, nearly noise-free (NNF) data are used since they are supposed to train more reliable classifiers. To this end, a novel NNF data extraction approach is also proposed. Experimental results on a set of benchmark datasets illustrate the utility of our proposed approach.
Year
DOI
Venue
2018
10.1007/s00521-016-2589-9
Neural Computing and Applications
Keywords
Field
DocType
Mislabeled data filtering, Ensemble learning, Noise-free data
Training set,Noisy data,Pattern recognition,Computer science,Random subspace method,Artificial intelligence,Data extraction,Ensemble learning,Machine learning
Journal
Volume
Issue
ISSN
29
10
1433-3058
Citations 
PageRank 
References 
0
0.34
32
Authors
4
Name
Order
Citations
PageRank
Yuan Wei Wei131229.13
Donghai Guan234848.29
Qi Zhu314711.68
Tinghuai Ma431440.76