Title
Truth Discovery via Exploiting Implications from Multi-Source Data
Abstract
Data veracity is a grand challenge for various tasks on the Web. Since the web data sources are inherently unreliable and may provide conflicting information about the same real-world entities, truth discovery emerges as a countermeasure of resolving the conflicts by discovering the truth, which conforms to the reality, from the multi-source data. A major challenge related to truth discovery is that different data items may have varying numbers of true values (or multi-truth), which counters the assumption of existing truth discovery methods that each data item should have exactly one true value. In this paper, we address this challenge by fully exploiting and leveraging the implications from multi-source data. In particular, we exploit three types of implications, namely the implicit negative claims, the distribution of positive/negative claims, and the co-occurrence of values in sources' claims, to facilitate multi-truth discovery. We propose a probabilistic approach with improvement measures that incorporate the three implications in all stages of the truth discovery process. In particular, incorporating the negative claims enables multi-truth discovery, considering the distribution of positive/negative claims relieves truth discovery from the impact of sources' behavioral features in the specific datasets, and considering values' co-occurrence relationship compensates the information lost from evaluating each value in the same claims individually. Experimental results on three real-world datasets demonstrate the effectiveness of our approach.
Year
DOI
Venue
2016
10.1145/2983323.2983791
ACM International Conference on Information and Knowledge Management
Keywords
Field
DocType
Truth Discovery,Multiple True Values,Probabilistic Model,Imbalanced Claims
Countermeasure,Data science,Multi source data,Data mining,Information retrieval,Computer science,Exploit,Statistical model,Probabilistic logic,Business process discovery
Conference
Citations 
PageRank 
References 
7
0.45
16
Authors
7
Name
Order
Citations
PageRank
Xianzhi Wang127640.32
Quan Z. Sheng23520301.77
Lina Yao398193.63
Xue Li42196186.96
Xiu Susie Fang5455.56
Xiaofei Xu640870.26
Boualem Benatallah76174494.38