Title
Controlling False Alarm/Discovery Rates in Online Internet Traffic Flow Classification
Abstract
Existing Internet traffic classification techniques achieve impressively low misclassification rates, but do not provide performance guarantees for particular classes of interest. In this paper, we propose two novel online traffic classifiers — one based on Neyman-Pearson classification and one based on the Learning Satisfiability (LSAT) framework — that can provide class-specific performance guarantees on the false alarm and false discovery rates, respectively. We also present a preprocessor for our classifiers that predicts, after the reception of only a small number of packets, whether a flow will be 'large' (as defined by a network operator). Only these resource-intensive flows are passed to the classifier, greatly reducing the computation burden imposed. We validate our methodology by testing our approaches using traff ic data provided by an ISP. I. INTRODUCTION Online Internet traffic classification involves the automatic association of a user-defined class to a traffic flow after the arrival of only a small number of its packets. Accurate traf- fic classification enables the provision of application-specific Quality-of-Service (QoS) guarantees (1) and simplifies the enforcement of service level agreements. ISPs can use the flow labels to prioritize time-sensitive traffic (e.g., VoIP and video- conferencing) and block (or throttle) undesirable traffic (e.g., peer-to-peer). The classification results also provide useful statistics for network provisioning and security assessment. Unfortunately, classifying Internet traffic is not a trivial task. Port numbers assigned by the Internet Assigned Numbers Authority (IANA) are not a good indicator of the application, because application developers frequently ignore the assign- ments, and use arbitrary (or even random) port numbers. Recently, machine learning techniques, including classifica- tion and clustering methods, have been proposed for traffic classification, with the underlying features derived from flow statistics (1)-(10). One important attribute missing from the classification methods proposed thus far is the provision of application/class-specific performance guarantees, particularly bounds on false alarm rates and false discovery rates .I n this paper, we address this deficiency, proposing online traffic classifiers that are designed to satisfy constraints on these performance metrics. The false alarm rate for class i (FAR i) refers to the expected fraction of the flows that do not belong to traffic class i that are incorrectly classified as belonging to i. This concept can be extended to a pairwise false alarm rate FAR ij for classes i and j, which specifies the expected proportion of the flows belonging to class j that are incorrectly labeled as class i by the classifier. The false alarm rate differs from the false discovery rate, FDRi, which is the expected fraction of incorrectly classified flows among all traffic flows classified as class i. This can also be extended to a pairwise false discovery rate, FDRij, which is the expected fraction of all flows classified as class i which do in fact belong to class j. A network operator can select which of these constraints is appropriate according to the specific goal of the classification procedure. To illustrate the need for controlling both of these performance metrics, consider a scenario where 1100 flows are simultaneously present in a network, with the following actual breakdown: 900 HTTP flows, 100 peer-to-peer (P2P) flows and 100 VoIP flows. Here the HTTP class dominates and it is possible to obtain a relatively high overall classification accuracy (low misclassification rate) by simply classifying the majority of flows as HTTP. However, such a strategy leads to unacceptably low detection rates for the VoIP and P2P classes. Any procedure, such as prioritization or rate- limiting, applied to flows that such a classifier labels as VoIP or P2P, will be ineffective. It is clear that optimizing the global misclassification rate is insufficient. Our approach allows the operator to set class-specific per- formance guarantees. For example, suppose that the operator wants to give higher priority to VoIP traffic and wants to bound the number of flows that are incorrectly prioritized. By specifying that the pairwise FAR of each pair of classes must not exceed 5%, the resultant classifier ensures that at most 50 non-VoIP flows would be classified as VoIP, and no more than 10 of the true VoIP flows are incorrectly identified as HTTP or P2P. Alternatively, suppose that the operator wishes to block P2P traffic, but ensure that of the blocked flows only a small percentage are incorrectly blocked. By training a classifier with the P2P FDR constraint set to 1%, the operator can guarantee that of every 100 blocked flows, at most one is incorrectly blocked. A. Contribution
Year
DOI
Venue
2009
10.1109/INFCOM.2009.5061976
Rio de Janeiro
Keywords
Field
DocType
Internet,pattern classification,quality of service,telecommunication traffic,Neyman-Pearson classification,false alarm rates,false discovery rates,learning satisfiability framework,online Internet traffic flow classification,quality-of-service
Data mining,False alarm,Computer science,Network packet,Support vector machine,Quality of service,Computer network,Preprocessor,Classifier (linguistics),Internet traffic,The Internet
Conference
ISSN
ISBN
Citations 
0743-166X E-ISBN : 978-1-4244-3513-5
978-1-4244-3513-5
4
PageRank 
References 
Authors
0.58
45
3
Name
Order
Citations
PageRank
Daniel Nechay140.58
Yvan Pointurier213520.65
Mark Coates361955.55