Title | ||
---|---|---|
An effective relevance prediction algorithm based on hierarchical taxonomy for focused crawling |
Abstract | ||
---|---|---|
How to give a formal description for a user's interested topic and predict the relevance of unvisited pages to the given topic effectively is a key issue in the design of focused crawlers. However, almost all previous known focused crawlers do the Relevance Predication based on the Flat Information (RPFI) of topic only, i.e. regardless of the context between keywords or topics. In this paper, we first introduce an algorithm to map the topic described in a keyword set or a document written in natural language text to those described in hierarchical topic taxonomy. Then, we propose a novel approach to do the Relevance Predication based on the Hierarchical Context Information (RPHCI) of the taxonomy. Experiments show that the focused crawler based on RPHCI can obtain significantly higher efficiency than those based on RPFI. |
Year | DOI | Venue |
---|---|---|
2008 | 10.1007/978-3-540-68636-1_72 | AIRS |
Keywords | Field | DocType |
focused crawler,hierarchical context information,focused crawling,relevance predication,higher efficiency,formal description,hierarchical topic taxonomy,interested topic,hierarchical taxonomy,effective relevance prediction algorithm,keyword set,flat information,key issue,natural language | Data mining,Crawling,Information retrieval,Computer science,Relevance prediction,Algorithm,Formal description,Natural language,Focused crawler,Natural language processing,Artificial intelligence | Conference |
Volume | Issue | ISSN |
4993 | null | 0302-9743 |
ISBN | Citations | PageRank |
3-540-68633-9 | 4 | 0.39 |
References | Authors | |
7 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zhumin Chen | 1 | 393 | 35.53 |
Jun Ma | 2 | 1280 | 127.50 |
Xiaohui Han | 3 | 17 | 5.41 |
Dongmei Zhang | 4 | 1439 | 132.94 |