Title
Learning on Class Imbalanced Data to Classify Peer-to-Peer Applications in IP Traffic using Resampling Techniques
Abstract
In many applications, one class of data is presented by a large number of examples while the other only by a few. For instance, in our previous works on identification of peer-to-peer (P2P) Internet traffics, we observed that only about 30% of examples can be labeled as ldquoP2Prdquo using a port-based heuristic rule, and even fewer examples can be labeled in the future as more and more P2P applications use dynamic ports. In this paper, the effect of three resampling techniques on balancing the class distribution in training C4.5 and neural networks for identifying P2P traffic is studied. The experimental data were captured at our campus gateway. Nine datasets with different percentages of ldquoP2Prdquo examples and six datasets of different sizes with an actual percentage of about 30% of ldquoP2Prdquo examples are used in the experiments. The results show that resampling techniques are effective and stable, and random over-sampling is a quite good choice for P2P traffic identification considering a combination of the classification performance and time complexity.
Year
DOI
Venue
2009
10.1109/IJCNN.2009.5178804
IJCNN
Keywords
Field
DocType
different percentage,ip networks,internet traffic,actual percentage,resampling technique,p2p traffic,experimental data,class distribution,class imbalanced data,different size,internet,ip traffic,p2p traffic identification,p2p application,telecommunication traffic,peer-to-peer computing,port-based heuristic rule,peer-to-peer application,class imbalanced data learning,neural nets,bandwidth,predictive models,accuracy,labeling,neural networks,p2p,classification algorithms,neural network,artificial neural networks,data mining,time complexity
Data mining,Heuristic,Peer-to-peer,Computer science,Artificial intelligence,Artificial neural network,Statistical classification,Time complexity,Resampling,Internet traffic,Machine learning,The Internet
Conference
Volume
Issue
ISSN
null
null
1098-7576 E-ISBN : 978-1-4244-3553-1
ISBN
Citations 
PageRank 
978-1-4244-3553-1
5
0.46
References 
Authors
14
3
Name
Order
Citations
PageRank
Weicai Zhong138126.14
Bijan Raahemi215522.29
Jing Liu31043115.54