Abstract | ||
---|---|---|
Many industrial and real-world datasets suffer from an unavoidable problem of missing values. The problem of missing data has been addressed extensively in the statistical analysis literature, and also, but to a lesser extent in the classification literature. The ability to deal with missing data is an essential requirement for classification because inadequate treatment of missing data may lead to large errors on classification. Feature selection has been successfully used to improve classification, but it has been applied mainly to complete data. This paper develops a wrapper feature selection approach to classification with missing data and investigates the impact of this approach. Empirical results on 10 datasets with missing values using C4.5 for an evaluation and particle swarm optimisation as a search technique in feature selection show that a wrapper feature selection for missing data not only can help to improve accuracy of the classifier, but also can help to reduce the complexity of the learned classification model. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1007/978-3-319-31204-0_44 | Lecture Notes in Computer Science |
Keywords | Field | DocType |
Missing data,Feature selection,Classification,C4.5,Particle swarm optimisation | Particle swarm optimization,Pattern recognition,Feature selection,Computer science,Artificial intelligence,Missing data,Classifier (linguistics),Statistical analysis | Conference |
Volume | ISSN | Citations |
9597 | 0302-9743 | 3 |
PageRank | References | Authors |
0.38 | 5 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Cao Truong Tran | 1 | 29 | 4.71 |
Mengjie Zhang | 2 | 3777 | 300.33 |
Peter Andreae | 3 | 358 | 31.85 |
Bing Xue | 4 | 416 | 51.52 |