Title
Review On General Techniques And Packages For Data Imputation In R On A Real World Dataset
Abstract
When we collect data, usually they consist of small samples with missing values. As a consequence of this flaw, the data analysis becomes less effective. Almost all algorithms for statistical data analysis need a complete data set. In data preprocessing, we have to deal with missing values. Some well-known methods for filling missing values are: Mean, K-nearest neighbours (kNN), fuzzy K-means (FKM), etc. There are quite a lot of R packages offering the imputation of missing values, but sometimes its hard to find the appropriate algorithm for a particular dataset. When we have to deal with large datasets sometimes, these known methods cannot work as supposed because they need too much memory to perform their operations. This paper provides an overview of a considerable dataset imputation by applying three different algorithms. A comparison was performed using three different algorithms under a missing completely at random (MCAR) assumption, and based on the evaluation criteria: Root mean squared error (RMSE). The experiment results show that Random Forest algorithm can be quite useful for missing values imputation.
Year
DOI
Venue
2018
10.1007/978-3-319-98446-9_36
COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2018, PT II
Keywords
Field
DocType
Imputation, Missing values, Real data
Data mining,Computer science,Fuzzy logic,Mean squared error,Data pre-processing,Artificial intelligence,Missing data,Imputation (statistics),Random forest,Machine learning
Conference
Volume
ISSN
Citations 
11056
0302-9743
0
PageRank 
References 
Authors
0.34
2
3
Name
Order
Citations
PageRank
Fitore Muharemi131.13
Doina Logofatu21716.74
Florin Leon37115.03