Title
Fisher Information in Flow Size Distribution
Abstract
The flow size distribution is a useful metric for traffic modeling and management. Its estimation based on sampled data, however, is problematic. Previous work has shown that flow sampling (FS) offers enormous statistical benefits over packet sampling but high resource requirements precludes its use in routers. We present Dual Sampling (DS), a two-parameter family, which, to a large extent, provide FS-like statistical performance by approaching FS continuously, with just packet-sampling-like computational cost. Our work utilizes a Fisher information based approach recently used to evaluate a number of sampling schemes, excluding FS, for TCP flows. We revise and extend the approach to make rigorous and fair comparisons between FS, DS and others. We show how DS significantly outperforms other packet based methods, including Sample and Hold, the closest packet sampling-based competitor to FS. We describe a packet sampling-based implementation of DS and analyze its key computational costs to show that router implementation is feasible. Our approach offers insights into numerous issues, including the notion of `flow quality' for understanding the relative performance of methods, and how and when employing sequence numbers is beneficial. Our work is theoretical with some simulation support and case studies on Internet data.
Year
Venue
Keywords
2011
CoRR
fisher information,information theory
Field
DocType
Volume
Data mining,Mathematical optimization,Computer science,Flow (psychology),Network packet,Theoretical computer science,Sampling (statistics),Fisher information,Sample and hold,Packet sampling,Router,The Internet
Journal
abs/1106.3809
Citations 
PageRank 
References 
0
0.34
2
Authors
2
Name
Order
Citations
PageRank
Paul Tune1838.83
Darryl Veitch290384.47