Title
What Is The Real Size Of A Sampled Network? The Case Of The Internet
Abstract
Most data concerning the topology of complex networks are the result of mapping projects which bear intrinsic limitations and cannot give access to complete, unbiased datasets. A particularly interesting case is represented by the physical Internet. Router-level Internet mapping projects generally consist of sampling the network from a limited set of sources by using traceroute probes. This methodology, akin to the merging of spanning trees from the different sources to a set of destinations, leads necessarily to a partial, incomplete map of the Internet. The determination of the real Internet topology characteristics from such sampled maps is therefore, in part, a problem of statistical inference. In this paper we present a twofold contribution in order to address this problem. First, we argue that inference of some of the standard topological quantities is, in fact, a version of the so-called "species" problem in statistics, which is important in categorizing the problem and providing some indication of its inherent difficulties. Second, we tackle the issue of estimating arguably the most basic of network characteristics-its number of nodes-and propose two estimators for this quantity, based on subsampling principles. Numerical simulations, as well as an experiment based on probing the Internet, suggest the feasibility of accounting for measurement bias in reporting Internet topology characteristics.
Year
DOI
Venue
2005
10.1103/PhysRevE.75.056111
PHYSICAL REVIEW E
Keywords
Field
DocType
internet topology,limit set,statistical inference,spanning tree
Internet topology,Network mapping,Data mining,Inference,Theoretical computer science,Spanning tree,Sampling (statistics),Statistical inference,Species problem,Mathematics,The Internet
Journal
Volume
Issue
ISSN
75
5
1539-3755
Citations 
PageRank 
References 
15
1.05
5
Authors
5
Name
Order
Citations
PageRank
Fabien Viger132316.44
Alain Barrat2140187.12
Luca Dall'Asta349339.53
Cun-Hui Zhang417418.38
Eric D. Kolaczyk516911.39