Abstract | ||
---|---|---|
Data veracity is a grand challenge for various tasks on the Web. Since the web data sources are inherently unreliable and may provide conflicting information about the same real-world entities, truth discovery emerges as a countermeasure of resolving the conflicts by discovering the truth, which conforms to the reality, from the multi-source data. A major challenge related to truth discovery is that different data items may have varying numbers of true values (or multi-truth), which counters the assumption of existing truth discovery methods that each data item should have exactly one true value. In this paper, we address this challenge by fully exploiting and leveraging the implications from multi-source data. In particular, we exploit three types of implications, namely the implicit negative claims, the distribution of positive/negative claims, and the co-occurrence of values in sources' claims, to facilitate multi-truth discovery. We propose a probabilistic approach with improvement measures that incorporate the three implications in all stages of the truth discovery process. In particular, incorporating the negative claims enables multi-truth discovery, considering the distribution of positive/negative claims relieves truth discovery from the impact of sources' behavioral features in the specific datasets, and considering values' co-occurrence relationship compensates the information lost from evaluating each value in the same claims individually. Experimental results on three real-world datasets demonstrate the effectiveness of our approach. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1145/2983323.2983791 | ACM International Conference on Information and Knowledge Management |
Keywords | Field | DocType |
Truth Discovery,Multiple True Values,Probabilistic Model,Imbalanced Claims | Countermeasure,Data science,Multi source data,Data mining,Information retrieval,Computer science,Exploit,Statistical model,Probabilistic logic,Business process discovery | Conference |
Citations | PageRank | References |
7 | 0.45 | 16 |
Authors | ||
7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Xianzhi Wang | 1 | 276 | 40.32 |
Quan Z. Sheng | 2 | 3520 | 301.77 |
Lina Yao | 3 | 981 | 93.63 |
Xue Li | 4 | 2196 | 186.96 |
Xiu Susie Fang | 5 | 45 | 5.56 |
Xiaofei Xu | 6 | 408 | 70.26 |
Boualem Benatallah | 7 | 6174 | 494.38 |