Abstract | ||
---|---|---|
It is widely known in the machine learning community that class noise can be (and often is) detrimental to inducing a model of the data. Many current approaches use a single, often biased, measurement to determine if an instance is noisy. A biased measure may work well on certain data sets, but it can also be less effective on a broader set of data sets. In this paper, we present noise identification using classifier diversity (NICD) -- a method for deriving a less biased noise measurement and integrating it into the learning process. To lessen the bias of the noise measure, NICD selects a diverse set of classifiers (based on their predictions of novel instances) to determine which instances are noisy. We examine NICD as a technique for filtering, instance weighting, and selecting the base classifiers of a voting ensemble. We compare NICD with several other noise handling techniques that do not consider classifier diversity on a set of 54 data sets and 5 learning algorithms. NICD significantly increases the classification accuracy over the other considered approaches and is effective across a broad set of data sets and learning algorithms. |
Year | Venue | Field |
---|---|---|
2014 | CoRR | Data set,Weighting,Pattern recognition,Noise measurement,Voting,Computer science,Filter (signal processing),Artificial intelligence,Classifier (linguistics),Machine learning |
DocType | Volume | Citations |
Journal | abs/1403.1893 | 0 |
PageRank | References | Authors |
0.34 | 27 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Michael R. Smith | 1 | 79 | 11.34 |
Tony R. Martinez | 2 | 1364 | 100.44 |