Title
Annotating Web Tables with the Crowd.
Abstract
The Web contains a large amount of structured tables, most of which lacks header rows. Algorithmic approaches have been proposed to recover semantics for web tables by annotating column labels and identifying subject columns. However, state-of-the-art technology is not yet able to provide satisfactory accuracy and recall. In this paper, we present a hybrid machine-crowdsourcing framework that leverages human intelligence to improve the performance of web table annotation. In this framework, machine-based algorithms are used to prompt human workers with candidate lists of concepts, while an improved K -means algorithm based on novel integrative distance is proposed to minimize the number of tuples posed to the crowd. In order to recommend the most related tasks for human workers and determine the final answers more accurately, an evaluation mechanism is also implemented based on Answer Credibility which measures the probability of a worker's intuitive answer being the final answer for a task. The results of extensive experiments conducted on real -world datasets show that our framework can significantly improve annotation accuracy and time efficiency for web tables, and our task reduction and answer evaluation mechanism is effective and efficient for improving answer quality.
Year
DOI
Venue
2018
10.4149/cai_2018_4_969
COMPUTING AND INFORMATICS
Keywords
Field
DocType
Crowdsourcing,semantic recovery,web tables,information integration
World Wide Web,Computer science,Theoretical computer science,Web tables
Journal
Volume
Issue
ISSN
37
4
1335-9150
Citations 
PageRank 
References 
0
0.34
0
Authors
2
Name
Order
Citations
PageRank
Ning Wang138.48
Huaxi Liu211.02