Title
Is sampled data sufficient for anomaly detection?
Abstract
Sampling techniques are widely used for traffic measurements at high link speed to conserve router resources. Traditionally, sampled traffic data is used for network management tasks such as traffic matrix estimations, but recently it has also been used in numerous anomaly detection algorithms, as security analysis becomes increasingly critical for network providers. While the impact of sampling on traffic engineering metrics such as flow size and mean rate is well studied, its impact on anomaly detection remains an open question.This paper presents a comprehensive study on whether existing sampling techniques distort traffic features critical for effective anomaly detection. We sampled packet traces captured from a Tier-1 IP-backbone using four popular methods: random packet sampling, random flow sampling, smart sampling, and sample-and-hold. The sampled data is then used as input to detect two common classes of anomalies: volume anomalies and port scans. Since it is infeasible to enumerate all existing solutions, we study three representative algorithms: a wavelet-based volume anomaly detection and two portscan detection algorithms based on hypotheses testing. Our results show that all the four sampling methods introduce fundamental bias that degrades the performance of the three detection schemes, however the degradation curves are very different. We also identify the traffic features critical for anomaly detection and analyze how they are affected by sampling. Our work demonstrates the need for better measurement techniques, since anomaly detection operates on a drastically different information region, which is often overlooked by existing traffic accounting methods that target heavy-hitters.
Year
DOI
Venue
2006
10.1145/1177080.1177102
Internet Measurement Conference
Keywords
Field
DocType
anomaly detection,sampling technique,smart sampling,sampling method,random packet sampling,random flow sampling,portscan detection,effective anomaly detection,numerous anomaly detection algorithm,detection scheme,metric,random sampling,sampling,network management,internet protocol,traffic management,anomaly,traffic flow,sampling methods,security analysis,network analysis,bias
Data mining,Anomaly detection,Traffic flow,Computer security,Computer science,Network packet,Computer network,Sampling (statistics),Network analysis,Traffic engineering,Statistical hypothesis testing,Wavelet
Conference
ISBN
Citations 
PageRank 
1-59593-561-4
100
4.85
References 
Authors
16
5
Name
Order
Citations
PageRank
Jianning Mai134218.63
Chen-Nee Chuah22006161.34
Ashwin Sridharan372455.79
Tao Ye41528.07
Hui Zang5105277.25