Title | ||
---|---|---|
Review On General Techniques And Packages For Data Imputation In R On A Real World Dataset |
Abstract | ||
---|---|---|
When we collect data, usually they consist of small samples with missing values. As a consequence of this flaw, the data analysis becomes less effective. Almost all algorithms for statistical data analysis need a complete data set. In data preprocessing, we have to deal with missing values. Some well-known methods for filling missing values are: Mean, K-nearest neighbours (kNN), fuzzy K-means (FKM), etc. There are quite a lot of R packages offering the imputation of missing values, but sometimes its hard to find the appropriate algorithm for a particular dataset. When we have to deal with large datasets sometimes, these known methods cannot work as supposed because they need too much memory to perform their operations. This paper provides an overview of a considerable dataset imputation by applying three different algorithms. A comparison was performed using three different algorithms under a missing completely at random (MCAR) assumption, and based on the evaluation criteria: Root mean squared error (RMSE). The experiment results show that Random Forest algorithm can be quite useful for missing values imputation. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1007/978-3-319-98446-9_36 | COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2018, PT II |
Keywords | Field | DocType |
Imputation, Missing values, Real data | Data mining,Computer science,Fuzzy logic,Mean squared error,Data pre-processing,Artificial intelligence,Missing data,Imputation (statistics),Random forest,Machine learning | Conference |
Volume | ISSN | Citations |
11056 | 0302-9743 | 0 |
PageRank | References | Authors |
0.34 | 2 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Fitore Muharemi | 1 | 3 | 1.13 |
Doina Logofatu | 2 | 17 | 16.74 |
Florin Leon | 3 | 71 | 15.03 |