Title
Missing value imputation through shorter interval selection driven by Fuzzy C-Means clustering
Abstract
AbstractAbstractThe presence of missing data is a common and pivotal issue, which generally leads to a serious decrease of data quality and thus indicates the necessity to effectively handle missing data. In this paper, we propose a missing value imputation approach driven by Fuzzy C-Mean clustering to improve the classification accuracy by referring only to the known feature values of some selected instances. In particular, the missing values for each instance are imputed by selecting a shorter interval based on the cluster membership value within the certain threshold limit of each feature, while using a short interval is considered to improve the imputation effectiveness and get more accurate estimation of the values in comparison with using a long interval. Our method is evaluated through comparing with state-of-the-art imputation methods on UCI datasets. The experimental results demonstrate that the proposed approach performs closely to or better than those state-of-the-art imputation methods.Graphical abstractDisplay OmittedHighlights •A missing data imputation technique called SISFCM is proposed for numeric features.•The SISFCM approach improves the imputation performance in comparison with other competitive state-of-the-art imputation methods.•The method shows its robustness to the change of the percentage of missing values.
Year
DOI
Venue
2021
10.1016/j.compeleceng.2021.107230
Periodicals
Keywords
DocType
Volume
Incomplete data processing, Missing value handling, Missing value imputation, Fuzzy C-Means clustering
Journal
93
Issue
ISSN
Citations 
C
0045-7906
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Hufsa Khan101.01
Xizhao Wang23593166.16
Han Liu312.71