Detecting Errors in Numerical Linked Data Using Cross-Checked Outlier Detection - Citegraph

Paper Info

Title
Detecting Errors in Numerical Linked Data Using Cross-Checked Outlier Detection

Abstract
Outlier detection used for identifying wrong values in data is typically applied to single datasets to search them for values of unexpected behavior. In this work, we instead propose an approach which combines the outcomes of two independent outlier detection runs to get a more reliable result and to also prevent problems arising from natural outliers which are exceptional values in the dataset but nevertheless correct. Linked Data is especially suited for the application of such an idea, since it provides large amounts of data enriched with hierarchical information and also contains explicit links between instances. In a first step, we apply outlier detection methods to the property values extracted from a single repository, using a novel approach for splitting the data into relevant subsets. For the second step, we exploit owl:sameAs links for the instances to get additional property values and perform a second outlier detection on these values. Doing so allows us to confirm or reject the assessment of a wrong value. Experiments on the DBpedia and NELL datasets demonstrate the feasibility of our approach.

Year	DOI	Venue
2014	10.1007/978-3-319-11964-9_23	Semantic Web Conference
Keywords	Field	DocType
data debugging,data quality,linked data,outlier detection	Anomaly detection,Data mining,Data quality,Computer science,Outlier,Linked data,Exploit	Conference
Volume	ISSN	Citations
8796	0302-9743	13
PageRank	References	Authors
0.81	16	5

Authors (5 rows)

Cited by (13 rows)

References (16 rows)

Name	Order	Citations	PageRank
Daniel Fleischhacker	1	70	5.40
Heiko Paulheim	2	1095	84.19
Volha Bryl	3	180	14.46
Johanna Völker	4	483	28.71
Christian Bizer	5	8448	524.93

1