Abstract | ||
---|---|---|
Modeling with real-world data is often plagued with the problem of missing values, limiting the applicability and validity of the developed model. Several algorithms exist in the literature to facilitate the analysis of incomplete data by imputing missing values. However, their imputation accuracy and practical applicability have not been systematically compared and studied. This makes the choice of appropriate imputation method difficult. The focus of this paper is to conduct an exploratory analysis of the popular missing data imputation algorithms. A new imputation algorithm based on clustering is also developed and demonstrated to be useful in a variety of ways to improve the efficiency of imputing missing values. These algorithms are benchmarked using datasets with significantly varying statistical properties. Based on the empirical results and theoretical analysis, a set of guidelines are proposed to assist in the selection of an appropriate imputation algorithm for a specific application. Finally these guidelines are used in a process modeling case study that involves the analysis of the design of an atomizer. It was observed that the imputed values are qualitatively valid thus providing evidence for the appropriateness of the proposed guidelines. |
Year | DOI | Venue |
---|---|---|
2007 | 10.3233/IDA-2007-11206 | Intell. Data Anal. |
Keywords | Field | DocType |
new imputation algorithm,iterative imputation algorithm,imputation accuracy,process modeling,incomplete data,missing value,popular missing data,appropriate imputation method,appropriate imputation algorithm,exploratory analysis,imputation algorithm,theoretical analysis,missing data,process model,clustering,principal components analysis | Data mining,Computer science,Process modeling,Algorithm,Artificial intelligence,Imputation (statistics),Missing data,Cluster analysis,Machine learning,Limiting,Principal component analysis,Missing data imputation | Journal |
Volume | Issue | ISSN |
11 | 2 | 1088-467X |
Citations | PageRank | References |
0 | 0.34 | 3 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Samuel H. Huang | 1 | 193 | 19.64 |
Ranganath Kothamasu | 2 | 18 | 1.30 |
Niharika Rapur | 3 | 0 | 0.34 |