Abstract | ||
---|---|---|
We present a demonstration of an interactive wrapper induction system, called Pictor, which is able to minimize labeling cost, yet extract data with high accuracy from a website. Our demonstration will introduce two proposed technologies: record-level wrappers and a wrapper-assisted labeling strategy. These approaches allow Pictor to exploit previously generated wrappers, in order to predict similar labels in a partially labeled webpage or a completely new webpage. Our experiment results show the effectiveness of the Pictor system. |
Year | DOI | Venue |
---|---|---|
2008 | 10.1145/1401890.1402028 | KDD |
Keywords | Field | DocType |
proposed technology,new webpage,record-level wrapper,experiment result,interactive wrapper induction system,pictor system,interactive system,similar label,high accuracy,information extraction | Data mining,World Wide Web,Web page,Information retrieval,Computer science,Exploit,Information extraction | Conference |
Citations | PageRank | References |
1 | 0.37 | 9 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Shuyi Zheng | 1 | 256 | 11.22 |
Matthew R. Scott | 2 | 93 | 10.84 |
Ruihua Song | 3 | 1138 | 59.33 |
Ji-Rong Wen | 4 | 4431 | 265.98 |