An Experiment to Test URL Features for Web Page Classification. - Citegraph

Paper Info

Title
An Experiment to Test URL Features for Web Page Classification.

Abstract
Web page classification has been extensively researched, using different types of features that are extracted either from the page content, the page structure or from other pages that link to that page. Using features from the page itself implies having to download it before its classification. We present an experiment to proof that URL tokens contain information enough to extract features to classify web pages. A classifier based on these features is able to classify a web page without having to download it previously, avoiding unnecessary downloads.

Year	DOI	Venue
2012	10.1007/978-3-642-28795-4_13	TRENDS IN PRACTICAL APPLICATIONS OF AGENTS AND MULTIAGENT SYSTEMS
Field	DocType	Volume
Same-origin policy,Static web page,World Wide Web,Information retrieval,Web page,Computer science,Download,Anchor text,Tree edit distance,Classifier (linguistics)	Conference	157
ISSN	Citations	PageRank
1867-5662	3	0.44
References	Authors
16	4

Authors (4 rows)

Cited by (3 rows)

References (16 rows)

Name	Order	Citations	PageRank
Inma Hernández	1	76	10.72
Carlos R. Rivero	2	111	16.25
David Ruiz	3	152	20.62
José Luis Arjona	4	19	5.71

1