Title
Adaptive on-line page importance computation
Abstract
The computation of page importance in a huge dynamic graph has recently attracted a lot of attention because of the web. Page importance, or page rank is defined as the fixpoint of a matrix equation. Previous algorithms compute it off-line and require the use of a lot of extra CPU as well as disk resources (e.g. to store, maintain and read the link matrix). We introduce a new algorithm OPIC that works on-line, and uses much less resources. In particular, it does not require storing the link matrix. It is on-line in that it continuously refines its estimate of page importance while the web/graph is visited. Thus it can be used to focus crawling to the most interesting pages. We prove the correctness of OPIC. We present Adaptive OPIC that also works on-line but adapts dynamically to changes of the web. A variant of this algorithm is now used by Xyleme.We report on experiments with synthetic data. In particular, we study the convergence and adaptiveness of the algorithms for various scheduling strategies for the pages to visit. We also report on experiments based on crawls of significant portions of the web.
Year
DOI
Venue
2003
10.1145/775152.775192
WWW
Keywords
Field
DocType
link matrix,new algorithm opic,interesting page,matrix equation,adaptive opic,adapts dynamically,adaptive on-line page importance,previous algorithm,page importance,huge dynamic graph,page rank,web pages,hyperlink,synthetic data
Static web page,Scheduling (computing),Computer science,Correctness,Theoretical computer science,Synthetic data,Hyperlink,Artificial intelligence,Fixed point,Computation,World Wide Web,Crawling,Machine learning
Conference
ISBN
Citations 
PageRank 
1-58113-680-3
121
7.12
References 
Authors
17
3
Search Limit
100121
Name
Order
Citations
PageRank
Serge Abiteboul190952941.83
Mihai Preda221219.88
Gregory Cobena358538.41