Abstract | ||
---|---|---|
We study the problem of automatic repairing of wrappers for Web information providers. Majority of Web wrappers use "hooks'' or "landmarks'' to find and extract relevant information from Web pages and such wrappers often become inoperable when the page structure is changed. The solution we propose in this paper extends conventional forward wrappers with alternative classifiers built using content features of extracted information and wrappers processing pages backward. We report some preliminary results of the information extraction recovery and wrapper repairing for a set of real Web provider changes. |
Year | DOI | Venue |
---|---|---|
2001 | 10.1145/502932.502938 | WIDM |
Keywords | Field | DocType |
information extraction recovery,web wrapper,alternative classifier,real web provider change,conventional forward wrapper,relevant information,page structure,web information provider,content feature,web page,information extraction,web pages | Static web page,Data interoperability,Data mining,World Wide Web,Web page,Information retrieval,Computer science,Information extraction,Web information | Conference |
ISBN | Citations | PageRank |
1-58113-444-4 | 15 | 0.79 |
References | Authors | |
8 | 1 |
Name | Order | Citations | PageRank |
---|---|---|---|
Boris Chidlovskii | 1 | 411 | 52.58 |