Abstract | ||
---|---|---|
Previous studies of the web graph structure have focused on the graph structure at the level of individual pages. In actuality the web is a hierarchically nested graph, with domains, hosts and web sites introducing intermediate levels of affiliation and administrativecontrol. To better understand the growth of the web we need to understand its macro-structure, in terms of the linkage between web sites. In this paper e approximate this by studying the graph of the linkage between hosts on the web. This as done based on snapshots of the web taken by Google in Oct 1999,Aug 2000 and Jun 2001.The connectivity between hosts is represented by a directed graph, with hosts as nodes and weighted edges representingthe count of hyperlinks between pages on the corresponding hosts. We demonstrate how such a "hostgraph" an be used to study connectivity properties of hosts and domains over time, anddiscuss a modified "copy model" too explain observed link eight distributions as a function of subgraph size. We discuss changes in the web over time in the size and connectivity of web sites and country domains. We also describe a data mining application of the hostgraph: a related host finding algorithm which achieves a precision of 0.65 at rank 3. |
Year | DOI | Venue |
---|---|---|
2001 | 10.1109/ICDM.2001.989500 | ICDM |
Keywords | Field | DocType |
hierarchically nested graph,web sites,graph structure,corresponding host,web graph structure,web site,copy model,country domain,subgraph size,connectivity property,mining linkage,data mining application,data mining,web pages,bibliometrics,country domains,directed graphs,citation analysis,couplings,navigation,weight distribution,directed graph,computer science,hyperlinks | Data mining,Graph,Web page,Hypermedia,Computer science,Citation analysis,Directed graph,Artificial intelligence,Hyperlink,Snapshot (computer storage),Machine learning,Country code top-level domain | Conference |
ISBN | Citations | PageRank |
0-7695-1119-8 | 94 | 13.22 |
References | Authors | |
14 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Krishna A. Bharat | 1 | 1211 | 252.86 |
Bay-Wei Chang | 2 | 524 | 73.00 |
Monika Rauch Henzinger | 3 | 4307 | 481.86 |
Matthias Ruhl | 4 | 608 | 48.78 |