Abstract | ||
---|---|---|
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the retrieval precision for the queries that generate irrelevant results. We believe that by reducing the number of irrelevant results; the users are encouraged to go back to a given site to search. Our experimental results on several different web sites and on the whole cnnfn collection demonstrate the feasibility of our approach. |
Year | DOI | Venue |
---|---|---|
2003 | 10.1145/956863.956961 | CIKM |
Keywords | Field | DocType |
whole cnnfn collection,web page template,novel approach,different web site,unstructured data,retrieval precision,web document,irrelevant result,web pages,information retrieval | Data mining,Site map,Web page,Information retrieval,Computer science,Website Parse Template,Unstructured data,Information extraction,Template | Conference |
ISBN | Citations | PageRank |
1-58113-723-0 | 17 | 1.02 |
References | Authors | |
8 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ling Ma | 1 | 50 | 5.36 |
Nazli Goharian | 2 | 460 | 49.93 |
Abdur Chowdhury | 3 | 2013 | 160.59 |
Misun Chung | 4 | 17 | 1.02 |