Abstract | ||
---|---|---|
A focused crawler is designed to traverse the Web to gather documents on a specific topic. It is not an easy task to predict which links lead to good pages. In this paper, we present a new approach for prediction of the important links to relevant pages based on a learned user model. In particular, we first collect pages that a user visits during a learning session, where the user browses the Web and specifically marks which pages she is interested in. We then examine the semantic content of these pages to construct a concept graph, which is used to learn the dominant content and link structure leading to target pages using a Hidden Markov Model (HMM). Experiments show that with learned HMM from a user's browsing, the crawling performs better than Best-First strategy. |
Year | DOI | Venue |
---|---|---|
2004 | 10.1109/WI.2004.70 | Web Intelligence |
Keywords | Field | DocType |
focused crawler,focused crawling,user visit,collect page,easy task,best-first strategy,dominant content,learning hmm,topic-specific browsing,hidden markov model,concept graph,semantic content,user model,computer science,web pages,intelligent agent,mathematics,predictive models,world wide web,search engines,e commerce,statistics,information retrieval,web service,hidden markov models | Data mining,Intelligent agent,World Wide Web,Crawling,Information retrieval,Web page,Computer science,User modeling,Focused crawler,Web service,Hidden Markov model,E-commerce | Conference |
ISBN | Citations | PageRank |
0-7695-2100-2 | 3 | 0.41 |
References | Authors | |
7 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hong-Yu Liu | 1 | 183 | 23.11 |
Evangelos Milios | 2 | 3073 | 360.46 |
Jeannette Janssen | 3 | 295 | 32.23 |