Title
Making Sense of Entities and Quantities in Web Tables
Abstract
HTML tables and spreadsheets on the Internet or in enterprise intranets usually contain valuable information, but are created ad-hoc. As a result, they often lack both systematic names for column headers and clear vocabulary for cell values. This limits the re-use of such tables and creates a huge heterogeneity problem when comparing or aggregating multiple tables. This paper aims to overcome this problem by automatically canonicalizing header names and cell values onto concepts, classes, entities and uniquely represented quantities registered in a knowledge base. To this end, we devise a probabilistic graphical model that captures coherence dependencies between cells in tables and candidate items in the space of concepts, entities and quantities. We give specific consideration to quantities which are mapped into (measure, dimension, unit, magnitude) quadruple over a taxonomy of physical (e.g. power consumption), monetary (e.g. revenue), temporal (e.g. date) and dimensionless (i.e. counts ) measures. Our experiments with Web tables from diverse domains demonstrate the viability of our method and its benefits over baselines.
Year
DOI
Venue
2016
10.1145/2983323.2983772
Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
Keywords
Field
DocType
Information extraction,Data understanding,Data integration and aggregation
Data mining,Information retrieval,Computer science,Information extraction,Knowledge base,Probabilistic logic,Graphical model,Header,Vocabulary,Table (information),The Internet
Conference
Citations 
PageRank 
References 
9
0.48
27
Authors
3
Name
Order
Citations
PageRank
Yusra Ibrahim1223.10
Mirek Riedewald2113684.31
Gerhard Weikum3127102146.01