Abstract | ||
---|---|---|
We consider finding descriptive labels for anonymous, structured datasets, such as those produced by state-of-the-art Web wrappers. We give a probabilistic model to estimate the affinity between attributes and labels, and describe a method that uses a Web search engine to populate the model. We discuss a method for finding good candidate labels for unlabeled datasets. Ours is the first unsupervised labeling method that does not rely on mining the HTML pages containing the data. Experimental results with data from 8 different domains show that our methods achieve high accuracy even with very few search engine accesses. |
Year | Venue | Keywords |
---|---|---|
2007 | OTM Conferences (1) | unlabeled datasets,probabilistic model,search engine access,web search engine,state-of-the-art web wrapper,descriptive label,different domain,html page,structured datasets,search engine |
Field | DocType | Volume |
Web search engine,Data mining,World Wide Web,Search engine,Information retrieval,Computer science,Database search engine,Statistical model,Search analytics,RDF | Conference | 4803 |
ISSN | ISBN | Citations |
0302-9743 | 3-540-76846-7 | 6 |
PageRank | References | Authors |
0.46 | 13 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Altigran S. da Silva | 1 | 938 | 50.30 |
Denilson Barbosa | 2 | 610 | 43.52 |
Joao M. B. Cavalcanti | 3 | 137 | 9.72 |
Marco A. S. Sevalho | 4 | 6 | 0.46 |