Title
Web table column categorisation and profiling.
Abstract
Relational tables collected from HTML pages (\"web tables\") are used for a variety of tasks including table extension, knowledge base completion, and data transformation. Most of the existing algorithms for these tasks assume that the data in the tables has the form of binary relations, i.e., relates a single entity to a value or to another entity. Our exploration of a large public corpus of web tables, however, shows that web tables contain a large fraction of non-binary relations which will likely be misinterpreted by the state-of-the-art algorithms. In this paper, we propose a categorisation scheme for web table columns which distinguishes the different types of relations that appear in tables on the Web and may help to design algorithms which better deal with these different types. Designing an automated classifier that can distinguish between different types of relations is non-trivial, because web tables are relatively small, contain a high level of noise, and often miss partial key values. In order to be able to perform this distinction, we propose a set of features which goes beyond probabilistic functional dependencies by using the union of multiple tables from the same web site and from different web sites to overcome the problem that single web tables are too small for the reliable calculation of functional dependencies.
Year
DOI
Venue
2016
10.1145/2932194.2932198
WebDB
Field
DocType
Citations 
Data mining,Information retrieval,Computer science,Binary relation,SPARQL,Web modeling,Social Semantic Web,Knowledge base,Probabilistic logic,Table (information),RDF,Database
Conference
1
PageRank 
References 
Authors
0.35
17
2
Name
Order
Citations
PageRank
Oliver Lehmberg11799.59
Christian Bizer28448524.93