Title
Fusing time-dependent web table data.
Abstract
A subset of the HTML tables on the Web contains relational data. The data in these tables covers a multitude of topics and is thus very useful for complementing or validating cross-domain knowledge bases, such as DBpedia, YAGO, or the Google Knowledge Graph. A large fraction of the data in these knowledge bases is time-dependent, meaning that the correctness of an attribute value depends on a point in time. Fusing data from web tables in order to determine correct values for time-dependent attributes is challenging as most web tables do not contain timestamp information. A possibility to deal with this sparsity is to exploit timestamps which appear in different locations on the web page around the table. But as these timestamps might not apply to the web table value in question, this approach introduces noise. This paper investigates the extent to which the performance of data fusion strategies that rely on voting, PageRank, and Knowledge-Based-Trust can be improved by incorporating noisy and sparse timestamp information. For this, we present a machine-learning-based approach which considers different types of noisy timestamps in the data fusion process, and experiment with propagating timestamp information between web tables in order to overcome sparsity. We evaluate the data fusion strategies using a large public corpus of web tables and a public gold standard of time-dependent attribute values. We find that our methods effectively choose and weigh timestamp information per attribute and reduce sparsity using propagation. By incorporating timestamp information into data fusion strategies that previously did not exploit temporal meta information, we are able to increase F1-measure on average by 5%.
Year
DOI
Venue
2016
10.1145/2932194.2932197
WebDB
Field
DocType
Citations 
PageRank,Data mining,Information retrieval,Relational database,Computer science,Correctness,SPARQL,Sensor fusion,Exploit,Timestamp,Database,RDF
Conference
0
PageRank 
References 
Authors
0.34
16
3
Name
Order
Citations
PageRank
Yaser Oulabi100.68
Robert Meusel223416.62
Christian Bizer38448524.93