Title
Detecting anomalous packets in network transfers: investigations using PCA, autoencoder and isolation forest in TCP
Abstract
Large-scale scientific workflows rely heavily on high-performance file transfers. These transfers require strict quality parameters such as guaranteed bandwidth, no packet loss or data duplication. To have successful file transfers, methods such as predetermined thresholds and statistical analysis need to be done to determine abnormal patterns. Network administrators routinely monitor and analyze network data for diagnosing and alleviating these, making decisions based on their experience. However, as networks grow and become complex, monitoring large data files and quickly processing them, makes it improbable to identify errors and rectify these. Abnormal file transfers have been classified by simply setting alert thresholds, via tools such as PerfSonar and TCP statistics (Tstat). This paper investigates the feasibility of unsupervised feature extraction methods for identifying network anomaly patterns with three unsupervised classification methods—principal component analysis, autoencoder and isolation forest. We collect file transfer statistics from two experiment sets—synthetic iPerf generated traffic and 1000 Genome workflow runs, with synthetically introduced anomalies. Our results show that while PCA and a simple autoencoder finds it difficult to detect clusters, the tree-variant isolation forest is able to identify anomalous packets by breaking down TCP traces into tree classes early.
Year
DOI
Venue
2020
10.1007/s10994-020-05870-y
Machine Learning
Keywords
DocType
Volume
PCA, Autoencoders, Isolation forest, Network traffic
Journal
109
Issue
ISSN
Citations 
5
0885-6125
2
PageRank 
References 
Authors
0.37
23
5
Name
Order
Citations
PageRank
Mariam Kiran112117.83
Cong Wang2284.58
George Papadimitriou3142.62
Anirban Mandal455040.69
Ewa Deelman55948420.48