Title
Overview of autofeed: an unsupervised learning system for generating webfeeds
Abstract
The AutoFeed system automatically extracts data from semistructured web sites. Previously, researchers have developed two types of supervised learning approaches for extracting web data: methods that create precise, site-specific extraction rules and methods that learn less-precise site-independent extraction rules. In either case, significant training is required. AutoFeed follows a third, more ambitious approach, in which unsupervised learning is used to analyze sites and discover their structure. Our method relies on a set of heterogeneous "experts", each of which is capable of identifying certain types of generic structure. Each expert represents its discoveries as "hints". Based on these hints, our system clusters the pages and identifies semi-structured data that can be extracted. To identify a good clustering, we use a probabilistic model of the hint-generation process. This paper summarizes our formulation of the fully-automatic web-extraction problem, our clustering approach, and our results on a set of experiments.
Year
Venue
Keywords
2006
AAAI
generic structure,ambitious approach,autofeed system,web data,less-precise site-independent extraction rule,clustering approach,semistructured web site,semi-structured data,good clustering,unsupervised learning system,extracts data,semi structured data,unsupervised learning,probabilistic model,supervised learning
Field
DocType
Citations 
Semi-supervised learning,Computer science,Supervised learning,Unsupervised learning,Statistical model,Artificial intelligence,Conceptual clustering,Cluster analysis,Machine learning
Conference
5
PageRank 
References 
Authors
0.50
14
2
Name
Order
Citations
PageRank
Bora Gazen1151.54
Steven Minton23473536.74