Title
Discrete models for data imputation
Abstract
The paper is concerned with the problem of automatic detection and correction of inconsistent or out of range data in a general process of statistical data collection. The proposed approach is able to deal with hierarchical data containing both qualitative and quantitative values. As customary, erroneous data records are detected by formulating a set of rules. Erroneous records should then be corrected, by modifying as less as possible the erroneous data, while causing minimum perturbation to the original frequency distributions of the data. Such process is called imputation. By encoding the rules with linear inequalities, we convert imputation problems into integer linear programming problems. The proposed procedure is tested on a real-world case of census. Results are extremely encouraging both from the computational and from the data quality point of view.
Year
DOI
Venue
2004
10.1016/j.dam.2004.04.004
Discrete Applied Mathematics
Keywords
Field
DocType
data correction,data quality point,statistical data collection,erroneous data record,erroneous record,integer programming,erroneous data,hierarchical data,discrete model,integer linear programming problem,data imputation,information reconstruction,general process,imputation problem,range data,data collection,data quality
Data collection,Data mining,Combinatorics,Frequency distribution,Data quality,Algorithm,Integer programming,Linear programming,Imputation (statistics),Linear inequality,Hierarchical database model,Mathematics
Journal
Volume
Issue
ISSN
144
1-2
Discrete Applied Mathematics
Citations 
PageRank 
References 
4
0.53
6
Authors
1
Name
Order
Citations
PageRank
Renato Bruni112715.79