WebSets: extracting sets of entities from the web using unsupervised information extraction - Citegraph

Paper Info

Title
WebSets: extracting sets of entities from the web using unsupervised information extraction

Abstract
We describe a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus. Most earlier approaches to this problem rely on combining clusters of distributionally similar terms and concept-instance pairs obtained with Hearst patterns. In contrast, our method relies on a novel approach for clustering terms found in HTML tables, and then assigning concept names to these clusters using Hearst patterns. The method can be efficiently applied to a large corpus, and experimental results on several datasets show that our method can accurately extract large numbers of concept-instance pairs.

Year	DOI	Venue
2012	10.1145/2124295.2124327	Proceedings of the fifth ACM international conference on Web search and data mining
Keywords	DocType	Volume
assigning concept name,open-domain information extraction method,distributionally similar term,large number,html table,large corpus,hearst pattern,concept-instance pair,unsupervised information extraction,clustering term,html corpus,information extraction,clustering,web mining	Conference	abs/1307.0261
Citations	PageRank	References
47	1.24	23
Authors
3

Authors (3 rows)

Cited by (47 rows)

References (23 rows)

Name	Order	Citations	PageRank
Bhavana Bharat Dalvi	1	201	17.31
William W. Cohen	2	10178	1243.74
James P. Callan	3	6237	833.28

1