On the purity of training and testing data for learning: The case of pedestrian detection. - Citegraph

Paper Info

Title
On the purity of training and testing data for learning: The case of pedestrian detection.

Abstract
The training and the evaluation of learning algorithms depend critically on the quality of data samples. We denote as pure the samples that identify clearly and without any ambiguity the class of objects of interest. For instance, in pedestrian detection algorithms, we consider as pure samples the ones containing persons who are fully visible and are imaged at a good resolution (larger than the detector window in size). The exclusive use of pure samples entails two kinds of problems. In training, it biases the detector to neglect slightly occluded and small sized samples (which we denote as impure), thus reducing its detection rate in a real world application. In testing, it leads to the unfair evaluation and comparison of different detectors since slightly impure samples, when detected, can be accounted for as false positives. In this paper we study how a sensible use of impure samples can benefit both the training and the evaluation of pedestrian detection algorithms. We improve the labelling of one of the most widely used pedestrian data sets (INRIA) taking into account the degree of sample impurity. We observe that including partially occluded pedestrians in the training improves performance, not only on partially visible examples, but also on the fully visible ones. Furthermore, we found that including pedestrians imaged at low resolutions is beneficial for detecting pedestrians in the same range of heights, leaving the performance on pure samples unchanged. However, including samples with too high a grade of impurity degrades the performance, thus a careful balance must be found. The proposed labelling will allow further studies on the role of impure samples in training pedestrian detectors and on devising fairer comparison metrics between different algorithms.

Year	DOI	Venue
2015	10.1016/j.neucom.2014.09.055	Neurocomputing
Keywords	Field	DocType
Sample purity,Pedestrian detection,Machine learning,Partial occlusion,INRIA person data set,Labelling	Pedestrian,Data set,Pattern recognition,Test data,Artificial intelligence,Ambiguity,Pedestrian detection,Detector,Mathematics,False positive paradox	Journal
Volume	ISSN	Citations
150	0925-2312	3
PageRank	References	Authors
0.43	30	3

Authors (3 rows)

Cited by (3 rows)

References (30 rows)

Name	Order	Citations	PageRank
Matteo Taiana	1	39	3.68
Jacinto C. Nascimento	2	396	40.94
Alexandre Bernardino	3	710	78.77

1