Title
An effective relevance prediction algorithm based on hierarchical taxonomy for focused crawling
Abstract
How to give a formal description for a user's interested topic and predict the relevance of unvisited pages to the given topic effectively is a key issue in the design of focused crawlers. However, almost all previous known focused crawlers do the Relevance Predication based on the Flat Information (RPFI) of topic only, i.e. regardless of the context between keywords or topics. In this paper, we first introduce an algorithm to map the topic described in a keyword set or a document written in natural language text to those described in hierarchical topic taxonomy. Then, we propose a novel approach to do the Relevance Predication based on the Hierarchical Context Information (RPHCI) of the taxonomy. Experiments show that the focused crawler based on RPHCI can obtain significantly higher efficiency than those based on RPFI.
Year
DOI
Venue
2008
10.1007/978-3-540-68636-1_72
AIRS
Keywords
Field
DocType
focused crawler,hierarchical context information,focused crawling,relevance predication,higher efficiency,formal description,hierarchical topic taxonomy,interested topic,hierarchical taxonomy,effective relevance prediction algorithm,keyword set,flat information,key issue,natural language
Data mining,Crawling,Information retrieval,Computer science,Relevance prediction,Algorithm,Formal description,Natural language,Focused crawler,Natural language processing,Artificial intelligence
Conference
Volume
Issue
ISSN
4993
null
0302-9743
ISBN
Citations 
PageRank 
3-540-68633-9
4
0.39
References 
Authors
7
4
Name
Order
Citations
PageRank
Zhumin Chen139335.53
Jun Ma21280127.50
Xiaohui Han3175.41
Dongmei Zhang41439132.94