Making Sense of Entities and Quantities in Web Tables - Citegraph

Paper Info

Title
Making Sense of Entities and Quantities in Web Tables

Abstract
HTML tables and spreadsheets on the Internet or in enterprise intranets usually contain valuable information, but are created ad-hoc. As a result, they often lack both systematic names for column headers and clear vocabulary for cell values. This limits the re-use of such tables and creates a huge heterogeneity problem when comparing or aggregating multiple tables. This paper aims to overcome this problem by automatically canonicalizing header names and cell values onto concepts, classes, entities and uniquely represented quantities registered in a knowledge base. To this end, we devise a probabilistic graphical model that captures coherence dependencies between cells in tables and candidate items in the space of concepts, entities and quantities. We give specific consideration to quantities which are mapped into (measure, dimension, unit, magnitude) quadruple over a taxonomy of physical (e.g. power consumption), monetary (e.g. revenue), temporal (e.g. date) and dimensionless (i.e. counts ) measures. Our experiments with Web tables from diverse domains demonstrate the viability of our method and its benefits over baselines.

Year	DOI	Venue
2016	10.1145/2983323.2983772	Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
Keywords	Field	DocType
Information extraction,Data understanding,Data integration and aggregation	Data mining,Information retrieval,Computer science,Information extraction,Knowledge base,Probabilistic logic,Graphical model,Header,Vocabulary,Table (information),The Internet	Conference
Citations	PageRank	References
9	0.48	27
Authors
3

Authors (3 rows)

Cited by (9 rows)

References (27 rows)

Name	Order	Citations	PageRank
Yusra Ibrahim	1	22	3.10
Mirek Riedewald	2	1136	84.31
Gerhard Weikum	3	12710	2146.01

1