Title
Fully automatic wrapper generation for search engines
Abstract
When a query is submitted to a search engine, the search engine returns a dynamically generated result page containing the result records, each of which usually consists of a link to and/or snippet of a retrieved Web page. In addition, such a result page often also contains information irrelevant to the query, such as information related to the hosting site of the search engine and advertisements. In this paper, we present a technique for automatically producing wrappers that can be used to extract search result records from dynamically generated result pages returned by search engines. Automatic search result record extraction is very important for many applications that need to interact with search engines such as automatic construction and maintenance of metasearch engines and deep Web crawling. The novel aspect of the proposed technique is that it utilizes both the visual content features on the result page as displayed on a browser and the HTML tag structures of the HTML source file of the result page. Experimental results indicate that this technique can achieve very high extraction accuracy.
Year
DOI
Venue
2005
10.1145/1060745.1060760
WWW
Keywords
Field
DocType
result record,search result record,automatic wrapper generation,proposed technique,html source file,automatic search result record,search engine,web page,result page,metasearch engine,deep web,information extraction,web pages
Web search engine,Static web page,Web search query,Metasearch engine,Organic search,World Wide Web,Information retrieval,Computer science,Search engine indexing,Search analytics,Database,Spamdexing
Conference
ISBN
Citations 
PageRank 
1-59593-046-9
159
5.56
References 
Authors
26
5
Search Limit
100159
Name
Order
Citations
PageRank
L. Gravano15668855.47
Weiyi Meng22722514.77
Zonghuan Wu349227.08
Vijay V. Raghavan42544506.92
Clement T. Yu531711419.96