Title
Effective Bayesian-network-based missing value imputation enhanced by crowdsourcing
Abstract
During the process of data collection, incompleteness is one of the most serious data quality problems to deal with. Traditional imputation methods mostly rely on statistics and machine learning techniques. However, both types of methods are limited in their accuracy due to lacking enough information about the missing data. To obtain more information, recent methods resort to external sources such as knowledge bases or the worldwide web. Unfortunately, such methods may still be less helpful, since there may exist little information about the missing values in the knowledge bases, or too much noise on the web. To tackle these issues, this paper adopts crowdsourcing as the external source, where hundreds of thousands of ordinary workers on the platform can provide high-quality information based on contextual knowledge and human cognitive ability. To reduce the cost, a joint model is proposed for imputation, which integrates crowdsourcing into the process of Bayesian inference. We first construct a Bayesian network for the attributes in the dataset, then the missing attribute values are inferred by Bayesian inference. To improve the accuracy of the Bayesian inference, we outsource a small number of informative tasks to the crowd workers, where the informative tasks are selected based on uncertainty and influence. The proposed approach is evaluated with extensive experiments using real-world datasets with a simulated crowd and two real crowdsourcing platforms. The experimental results show that our approach achieves a better performance compared to other imputation approaches.
Year
DOI
Venue
2020
10.1016/j.knosys.2019.105199
Knowledge-Based Systems
Keywords
Field
DocType
Missing values,Bayesian network,Crowdsourcing
Data mining,Data collection,Data quality,Bayesian inference,Crowdsourcing,Computer science,Outsourcing,Bayesian network,Artificial intelligence,Imputation (statistics),Missing data,Machine learning
Journal
Volume
ISSN
Citations 
190
0950-7051
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Chen Ye184.16
Hongzhi Wang242173.72
Wenbo Lu300.68
Jianzhong Li46324.23