Abstract | ||
---|---|---|
Web page classification has been extensively researched, using different types of features that are extracted either from the page content, the page structure or from other pages that link to that page. Using features from the page itself implies having to download it before its classification. We present an experiment to proof that URL tokens contain information enough to extract features to classify web pages. A classifier based on these features is able to classify a web page without having to download it previously, avoiding unnecessary downloads. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1007/978-3-642-28795-4_13 | TRENDS IN PRACTICAL APPLICATIONS OF AGENTS AND MULTIAGENT SYSTEMS |
Field | DocType | Volume |
Same-origin policy,Static web page,World Wide Web,Information retrieval,Web page,Computer science,Download,Anchor text,Tree edit distance,Classifier (linguistics) | Conference | 157 |
ISSN | Citations | PageRank |
1867-5662 | 3 | 0.44 |
References | Authors | |
16 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Inma Hernández | 1 | 76 | 10.72 |
Carlos R. Rivero | 2 | 111 | 16.25 |
David Ruiz | 3 | 152 | 20.62 |
José Luis Arjona | 4 | 19 | 5.71 |