Title
Fast and simple dataset selection for machine learning.
Abstract
The task of data reduction is discussed and a novel selection approach which allows to control the optimal point distribution of the selected data subset is proposed. The proposed approach utilizes the estimation of probability density functions (pdfs). Due to its structure, the new method is capable of selecting a subset either by approximating the pdf of the original dataset or by approximating an arbitrary, desired target pdf. The new strategy evaluates the estimated pdfs solely on the selected data points, resulting in a simple and efficient algorithm with low computational and memory demand. The performance of the new approach is investigated for two different scenarios. For representative subset selection of a dataset, the new approach is compared to a recently proposed, more complex method and shows comparable results. For the demonstration of the capability of matching a target pdf, a uniform distribution is chosen as an example. Here the new method is compared to strategies for space-filling design of experiments and shows convincing results.
Year
DOI
Venue
2019
10.1515/auto-2019-0010
AT-AUTOMATISIERUNGSTECHNIK
Keywords
Field
DocType
machine learning,dataset selection,design of experiments,space-filling design,domain adaptation
Artificial intelligence,Engineering,Machine learning
Journal
Volume
Issue
ISSN
67
10
0178-2312
Citations 
PageRank 
References 
0
0.34
0
Authors
2
Name
Order
Citations
PageRank
Timm J. Peter100.68
Oliver Nelles29917.27