Review On General Techniques And Packages For Data Imputation In R On A Real World Dataset - Citegraph

Paper Info

Title
Review On General Techniques And Packages For Data Imputation In R On A Real World Dataset

Abstract
When we collect data, usually they consist of small samples with missing values. As a consequence of this flaw, the data analysis becomes less effective. Almost all algorithms for statistical data analysis need a complete data set. In data preprocessing, we have to deal with missing values. Some well-known methods for filling missing values are: Mean, K-nearest neighbours (kNN), fuzzy K-means (FKM), etc. There are quite a lot of R packages offering the imputation of missing values, but sometimes its hard to find the appropriate algorithm for a particular dataset. When we have to deal with large datasets sometimes, these known methods cannot work as supposed because they need too much memory to perform their operations. This paper provides an overview of a considerable dataset imputation by applying three different algorithms. A comparison was performed using three different algorithms under a missing completely at random (MCAR) assumption, and based on the evaluation criteria: Root mean squared error (RMSE). The experiment results show that Random Forest algorithm can be quite useful for missing values imputation.

Year	DOI	Venue
2018	10.1007/978-3-319-98446-9_36	COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2018, PT II
Keywords	Field	DocType
Imputation, Missing values, Real data	Data mining,Computer science,Fuzzy logic,Mean squared error,Data pre-processing,Artificial intelligence,Missing data,Imputation (statistics),Random forest,Machine learning	Conference
Volume	ISSN	Citations
11056	0302-9743	0
PageRank	References	Authors
0.34	2	3

Authors (3 rows)

Cited by (0 rows)

References (2 rows)

Name	Order	Citations	PageRank
Fitore Muharemi	1	3	1.13
Doina Logofatu	2	17	16.74
Florin Leon	3	71	15.03

1