Labeling data extracted from the web - Citegraph

Paper Info

Title
Labeling data extracted from the web

Abstract
We consider finding descriptive labels for anonymous, structured datasets, such as those produced by state-of-the-art Web wrappers. We give a probabilistic model to estimate the affinity between attributes and labels, and describe a method that uses a Web search engine to populate the model. We discuss a method for finding good candidate labels for unlabeled datasets. Ours is the first unsupervised labeling method that does not rely on mining the HTML pages containing the data. Experimental results with data from 8 different domains show that our methods achieve high accuracy even with very few search engine accesses.

Year	Venue	Keywords
2007	OTM Conferences (1)	unlabeled datasets,probabilistic model,search engine access,web search engine,state-of-the-art web wrapper,descriptive label,different domain,html page,structured datasets,search engine
Field	DocType	Volume
Web search engine,Data mining,World Wide Web,Search engine,Information retrieval,Computer science,Database search engine,Statistical model,Search analytics,RDF	Conference	4803
ISSN	ISBN	Citations
0302-9743	3-540-76846-7	6
PageRank	References	Authors
0.46	13	4

Authors (4 rows)

Cited by (6 rows)

References (13 rows)

Name	Order	Citations	PageRank
Altigran S. da Silva	1	938	50.30
Denilson Barbosa	2	610	43.52
Joao M. B. Cavalcanti	3	137	9.72
Marco A. S. Sevalho	4	6	0.46

1