Abstract | ||
---|---|---|
This paper reports a new general framework of focused web crawling based on "relational subgroup discovery". Predi- cates are used explicitly to represent the relevance clues of those unvisited pages in the crawl frontier, and then first- order classification rules are induced using subgroup discov- ery technique. The learned relational rules with sufficient support and confidence will guide the crawling process af- terwards. We present the many interesting features of our proposed first-order focused crawler, together with prelimi- nary promising experimental results. Categories and Subject Descriptors: H.5.4 (Informa- tion interfaces and presentation): Hypertext/hypermedia; I.2.6 (Artificial intelligence): Learning |
Year | DOI | Venue |
---|---|---|
2007 | 10.1145/1242572.1242744 | WWW |
Keywords | Field | DocType |
general terms: algorithms,measurements keywords: focused crawling,proposed first-order,interesting feature,first-order classification rule,relational rule,relational subgroup discovery,subgroup discovery technique,new general framework,performance,focused web,crawl frontier,relational subgroup discov- ery,crawling process,first order,web crawling,artificial intelligent | Data mining,World Wide Web,Crawling,Information retrieval,First order,Computer science,Focused crawler,Web crawler | Conference |
Citations | PageRank | References |
4 | 0.47 | 4 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Qingyang Xu | 1 | 15 | 2.85 |
Wanli Zuo | 2 | 342 | 42.73 |