Title
User-driven quality evaluation of DBpedia
Abstract
Linked Open Data (LOD) comprises of an unprecedented volume of structured datasets on the Web. However, these datasets are of varying quality ranging from extensively curated datasets to crowdsourced and even extracted data of relatively low quality. We present a methodology for assessing the quality of linked data resources, which comprises of a manual and a semi-automatic process. The first phase includes the detection of common quality problems and their representation in a quality problem taxonomy. In the manual process, the second phase comprises of the evaluation of a large number of individual resources, according to the quality problem taxonomy via crowdsourcing. This process is accompanied by a tool wherein a user assesses an individual resource and evaluates each fact for correctness. The semi-automatic process involves the generation and verification of schema axioms. We report the results obtained by applying this methodology to DBpedia. We identified 17 data quality problem types and 58 users assessed a total of 521 resources. Overall, 11.93% of the evaluated DBpedia triples were identified to have some quality issues. Applying the semi-automatic component yielded a total of 222,982 triples that have a high probability to be incorrect. In particular, we found that problems such as object values being incorrectly extracted, irrelevant extraction of information and broken links were the most recurring quality problems. With this study, we not only aim to assess the quality of this sample of DBpedia resources but also adopt an agile methodology to improve the quality in future versions by regularly providing feedback to the DBpedia maintainers.
Year
DOI
Venue
2013
10.1145/2506182.2506195
I-SEMANTICS
Keywords
Field
DocType
quality issue,dbpedia maintainers,data quality problem type,quality problem,low quality,individual resource,semi-automatic process,user-driven quality evaluation,common quality problem,quality problem taxonomy,varying quality,evaluation,extraction,rdf,data quality
Data mining,Data quality,Information retrieval,Axiom,Crowdsourcing,Computer science,Correctness,Linked data,Agile software development,Schema (psychology),RDF
Conference
Citations 
PageRank 
References 
63
2.73
17
Authors
7
Name
Order
Citations
PageRank
Amrapali Zaveri136824.37
Dimitris Kontokostas249031.79
Mohamed A. Sherif3632.73
Lorenz Bühmann460331.20
Mohamed Morsey540517.67
Sören Auer65711418.56
Jens Lehmann75375355.08