Abstract | ||
---|---|---|
We present a framework for approximating random-walk based probability distributions over Web pages using graph aggregation. The basic idea is to partition the graph into classes of quasi-equivalent vertices, to project the page-based random walk to be approximated onto those classes, and to compute the stationary probability distribution of the resulting class-based random walk. From this distribution we can quickly reconstruct a distribution on pages. In particular, our framework can approximate the well-known PageRank distribution by setting the classes according to the set of pages on each Web host.We experimented on a Web-graph containing over 1.4 billion pages and over 6.6 billion links from a crawl of the Web conducted by AltaVista in September 2003. We were able to produce a ranking that has Spearman rank-order correlation of 0.95 with respect to PageRank. The clock time required by a simplistic implementation of our method was less than half the time required by a highly optimized implementation of PageRank, implying that larger speedup factors are probably possible. |
Year | DOI | Venue |
---|---|---|
2006 | 10.1007/s10791-006-7146-1 | World Wide Web Conference Series |
Keywords | DocType | Volume |
clock time,class-based random walk,efficient pagerank approximation,well-known pagerank distribution,web ir · citation and link analysis,web host,graph aggregation,billion link,web page,billion page,stationary probability distribution,probability distribution,random walk,web pages,link analysis | Journal | 9 |
Issue | ISSN | ISBN |
2 | 1386-4564 | 1-58113-912-8 |
Citations | PageRank | References |
49 | 2.28 | 21 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Andrei Broder | 1 | 7357 | 920.20 |
R. Lempel | 2 | 77 | 4.01 |
Farzin Maghoul | 3 | 1198 | 173.90 |
Jan O. Pedersen | 4 | 6301 | 1177.07 |