Title
Focused Crawling by Learning HMM from User's Topic-specific Browsing
Abstract
A focused crawler is designed to traverse the Web to gather documents on a specific topic. It is not an easy task to predict which links lead to good pages. In this paper, we present a new approach for prediction of the important links to relevant pages based on a learned user model. In particular, we first collect pages that a user visits during a learning session, where the user browses the Web and specifically marks which pages she is interested in. We then examine the semantic content of these pages to construct a concept graph, which is used to learn the dominant content and link structure leading to target pages using a Hidden Markov Model (HMM). Experiments show that with learned HMM from a user's browsing, the crawling performs better than Best-First strategy.
Year
DOI
Venue
2004
10.1109/WI.2004.70
Web Intelligence
Keywords
Field
DocType
focused crawler,focused crawling,user visit,collect page,easy task,best-first strategy,dominant content,learning hmm,topic-specific browsing,hidden markov model,concept graph,semantic content,user model,computer science,web pages,intelligent agent,mathematics,predictive models,world wide web,search engines,e commerce,statistics,information retrieval,web service,hidden markov models
Data mining,Intelligent agent,World Wide Web,Crawling,Information retrieval,Web page,Computer science,User modeling,Focused crawler,Web service,Hidden Markov model,E-commerce
Conference
ISBN
Citations 
PageRank 
0-7695-2100-2
3
0.41
References 
Authors
7
3
Name
Order
Citations
PageRank
Hong-Yu Liu118323.11
Evangelos Milios23073360.46
Jeannette Janssen329532.23