Title
A fast algorithm to find all high degree vertices in power law graphs
Abstract
Sampling from large graphs is an area which is of great interest, particularly with the recent emergence of huge structures such as Online Social Networks. These often contain hundreds of millions of vertices and billions of edges. The large size of these networks makes it computationally expensive to obtain structural properties of the underlying graph by exhaustive search. If we can estimate these properties by taking small but representative samples from the network, then size is no longer a problem. In this paper we develop an analysis of random walks, a commonly used method of sampling from networks. We present a method of biassing the random walk to acquire a complete sample of high degree vertices of social networks, or similar graphs. The preferential attachment model is a common method to generate graphs with a power law degree sequence. For this model, we prove that this sampling method is successful with high probability. We also make experimental studies of the method on various real world networks. For t-vertex graphs G(t) generated by a preferential attachment process, we analyze a biassed random walk which makes transitions along undirected edges {x,y} proportional to [d(x)d(y)]b, where d(x) is the degree of vertex x and b 0 is a constant parameter. Let S(a) be the set of all vertices of degree at least ta in G(t). We show that for some b approx 2/3, if the biassed random walk starts at an arbitrary vertex of S(a), then with high probability the set S(a) can be discovered completely in ~O(t1-(4/3)a+d) steps, where d is a very small positive constant. The notation ~O ignores poly-log t factors. The preferential attachment process generates graphs with power law 3, so the above example is a special case of this result. For graphs with degree sequence power law c2 generated by a generalized preferential attachment process, a random walk with transitions along undirected edges {x,y} proportional to (d(x)d(y))(c-2)/2, discovers the set S(a) completely in ~O(t1-a(c-2)+d) steps with high probability. The cover time of the graph is ~O(t). Our results say that if we search preferential attachment graphs with a bias b=(c-2)/2 proportional to the power law c then, (i) we can find all high degree vertices quickly, and (ii) the time to discover all vertices is not much higher than in the case of a simple random walk. We conduct experimental tests on generated networks and real-world networks, which confirm these two properties.
Year
DOI
Venue
2012
10.1145/2187980.2188235
WWW (Companion Volume)
Keywords
Field
DocType
power law graph,power law degree sequence,high degree,random walk,undirected edge,biassed random walk,power law,high probability,preferential attachment process,high degree vertex,fast algorithm,degree sequence power law,degree sequence,random walks,sampling methods,exhaustive search,social network
Data mining,Combinatorics,Approx,Simple random sample,Vertex (geometry),Computer science,Random walk,Degree (graph theory),Sampling (statistics),Power law,Preferential attachment
Conference
Citations 
PageRank 
References 
6
0.43
18
Authors
3
Name
Order
Citations
PageRank
Colin Cooper128730.73
Tomasz Radzik2109895.68
Yiannis Siantos3473.14