Title
An Approach To Sample Selection From Big Data For Classification
Abstract
When traditional sample selection methods are used to compress large data sets, the computational complexity turns out to be very high and it is really time consuming. To avoid these shortcomings, we propose a new method to select samples based on non-stable cut points. With the basic characteristic of convex function that its extreme values occur at the endpoints of intervals, the method measures the extent of a sample being endpoints by labeling non-stable cut points. Then we can select the samples with higher endpoint extent, which can avoid calculating the distances between samples. This method aims to compress the data sets and improve the computational efficiency without affecting the classification accuracy. Experiments show that the proposed algorithm performs very well on the compression of data sets with higher imbalance degree. Meanwhile, the method is experimentally confirmed to have strong noise-resistance.
Year
Venue
Keywords
2016
2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC)
Big data classification, Sample selection, Non-stable cut points, Decision tree
Field
DocType
ISSN
Data set,Algorithm design,Computer science,Extreme value theory,Convex function,Artificial intelligence,Statistical classification,Big data,Machine learning,Computational complexity theory,Fold (higher-order function)
Conference
1062-922X
Citations 
PageRank 
References 
0
0.34
0
Authors
4
Name
Order
Citations
PageRank
Sheng Xing100.34
Yu-Lin He2906.31
Hong Zhu3172.24
Xizhao Wang43593166.16