Annotating Web Tables with the Crowd. - Citegraph

Paper Info

Title
Annotating Web Tables with the Crowd.

Abstract
The Web contains a large amount of structured tables, most of which lacks header rows. Algorithmic approaches have been proposed to recover semantics for web tables by annotating column labels and identifying subject columns. However, state-of-the-art technology is not yet able to provide satisfactory accuracy and recall. In this paper, we present a hybrid machine-crowdsourcing framework that leverages human intelligence to improve the performance of web table annotation. In this framework, machine-based algorithms are used to prompt human workers with candidate lists of concepts, while an improved K -means algorithm based on novel integrative distance is proposed to minimize the number of tuples posed to the crowd. In order to recommend the most related tasks for human workers and determine the final answers more accurately, an evaluation mechanism is also implemented based on Answer Credibility which measures the probability of a worker's intuitive answer being the final answer for a task. The results of extensive experiments conducted on real -world datasets show that our framework can significantly improve annotation accuracy and time efficiency for web tables, and our task reduction and answer evaluation mechanism is effective and efficient for improving answer quality.

Year	DOI	Venue
2018	10.4149/cai_2018_4_969	COMPUTING AND INFORMATICS
Keywords	Field	DocType
Crowdsourcing,semantic recovery,web tables,information integration	World Wide Web,Computer science,Theoretical computer science,Web tables	Journal
Volume	Issue	ISSN
37	4	1335-9150
Citations	PageRank	References
0	0.34	0
Authors
2

Authors (2 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Ning Wang	1	3	8.48
Huaxi Liu	2	1	1.02

1