Title
SPHINX: a framework for creating personal, site-specific Web crawlers
Abstract
Crawlers, also called robots and spiders, are programs that browse the World Wide Web autonomously. This paper describes SPHINX, a Java toolkit and interactive development environment for Web crawlers. Unlike other crawler development systems, SPHINX is geared towards developing crawlers that are Web-site-specific, personally customized, and relocatable. SPHINX allows site-specific crawling rules to be encapsulated and reused in content analyzers, known as classifiers: Personal crawling tasks can be performed (often without programming) in the Crawler Workbench, an interactive environment for crawler development and testing. For efficiency, relocatable crawlers developed using SPHINX can be uploaded and executed on a remote Web server.
Year
DOI
Venue
1998
10.1016/S0169-7552(98)00064-6
Computer Networks
Keywords
DocType
Volume
robots,web crawler,world wide web,java
Journal
30
Issue
ISSN
Citations 
1-7
0169-7552
89
PageRank 
References 
Authors
23.89
13
2
Name
Order
Citations
PageRank
Robert C. Miller14412326.00
Krishna A. Bharat21211252.86