Title
Labeling data extracted from the web
Abstract
We consider finding descriptive labels for anonymous, structured datasets, such as those produced by state-of-the-art Web wrappers. We give a probabilistic model to estimate the affinity between attributes and labels, and describe a method that uses a Web search engine to populate the model. We discuss a method for finding good candidate labels for unlabeled datasets. Ours is the first unsupervised labeling method that does not rely on mining the HTML pages containing the data. Experimental results with data from 8 different domains show that our methods achieve high accuracy even with very few search engine accesses.
Year
Venue
Keywords
2007
OTM Conferences (1)
unlabeled datasets,probabilistic model,search engine access,web search engine,state-of-the-art web wrapper,descriptive label,different domain,html page,structured datasets,search engine
Field
DocType
Volume
Web search engine,Data mining,World Wide Web,Search engine,Information retrieval,Computer science,Database search engine,Statistical model,Search analytics,RDF
Conference
4803
ISSN
ISBN
Citations 
0302-9743
3-540-76846-7
6
PageRank 
References 
Authors
0.46
13
4
Name
Order
Citations
PageRank
Altigran S. da Silva193850.30
Denilson Barbosa261043.52
Joao M. B. Cavalcanti31379.72
Marco A. S. Sevalho460.46