Title
First-order focused crawling
Abstract
This paper reports a new general framework of focused web crawling based on "relational subgroup discovery". Predi- cates are used explicitly to represent the relevance clues of those unvisited pages in the crawl frontier, and then first- order classification rules are induced using subgroup discov- ery technique. The learned relational rules with sufficient support and confidence will guide the crawling process af- terwards. We present the many interesting features of our proposed first-order focused crawler, together with prelimi- nary promising experimental results. Categories and Subject Descriptors: H.5.4 (Informa- tion interfaces and presentation): Hypertext/hypermedia; I.2.6 (Artificial intelligence): Learning
Year
DOI
Venue
2007
10.1145/1242572.1242744
WWW
Keywords
Field
DocType
general terms: algorithms,measurements keywords: focused crawling,proposed first-order,interesting feature,first-order classification rule,relational rule,relational subgroup discovery,subgroup discovery technique,new general framework,performance,focused web,crawl frontier,relational subgroup discov- ery,crawling process,first order,web crawling,artificial intelligent
Data mining,World Wide Web,Crawling,Information retrieval,First order,Computer science,Focused crawler,Web crawler
Conference
Citations 
PageRank 
References 
4
0.47
4
Authors
2
Name
Order
Citations
PageRank
Qingyang Xu1152.85
Wanli Zuo234242.73