Title
Wrapper Generation for Web Accessible Data Sources
Abstract
There is an increase in the number of data sources that can be queried across the WWW. Such sources typically support HTML forms-based interfaces and search engines query collections of suitably indexed data. The data is displayed via a browser. One drawback is that there is no standard programming interface suitable for applications to submit queries. Second, the output (answer to a query) is not well structured. Structured objects have to be extracted from the HTML documents which contain irrelevant data and which may be volatile. Third, domain knowledge about the data source is also embedded in HTML documents and must be extracted. To solve these problems, we present technology to define and (automatically) generate wrappers for Web accessible sources. Our contributions are as follows: (1)Defining a wrapper interface to specify the capability of Web accessible data sources. (2) Developing a wrapper generation toolkit of graphical interfaces and specification languages to specify the capability of sources and the functionality of the wrapper. (3) Developing the technology to automatically generate a wrapper appropriate to the Web accessible source, from the specifications.
Year
DOI
Venue
1998
10.1109/COOPIS.1998.706180
Iee Proceedings I Communications, Speech and Vision
Keywords
Field
DocType
web accessible data sources,web accessible data source,graphical interface,html document,html forms-based interface,wrapper generation,wrapper interface,wrapper generation toolkit,web accessible source,irrelevant data,present technology,data source,domain knowledge,internet,search engine,indexation,web accessibility,specification language
Drawback,Data source,World Wide Web,Search engine,Information retrieval,Domain knowledge,Computer science,The Internet
Conference
ISBN
Citations 
PageRank 
0-8186-8380-5
53
8.37
References 
Authors
13
4
Name
Order
Citations
PageRank
Jean-Robert Gruser118042.37
Louiqa Raschid21522417.56
maria esther vidal378795.93
Laura Bright417617.34