Title
Large Scale Enrichment and Statistical Cyber Characterization of Network Traffic
Abstract
Modern network sensors continuously produce enormous quantities of raw data that are beyond the capacity of human analysts. Cross-correlation of network sensors increases this challenge by enriching every network event with additional metadata. These large volumes of enriched network data present opportunities to statistically characterize network traffic and quickly answer a key question: “What are the primary cyber characteristics of my network data?” The Python GraphBLAS and PyD4M analysis frameworks enable anonymized statistical analysis to be performed quickly and efficiently on very large network data sets. This approach is tested using billions of anonymized network data samples from the largest Internet observatory (CAIDA Telescope) and tens of millions of anonymized records from the largest commercially available background enrichment capability (GreyNoise). The analysis confirms that most of the enriched variables follow expected heavy-tail distributions and that a large fraction of the network traffic is due to a small number of cyber activities. This information can simplify the cyber analysts' task by enabling prioritization of cyber activities based on statistical prevalence.
Year
DOI
Venue
2022
10.1109/HPEC55821.2022.9926397
2022 IEEE High Performance Extreme Computing Conference (HPEC)
Keywords
DocType
ISSN
Cybersecurity,High Performing Computing,Big Data,Networks Scanning,Dimensional Analysis,Internet Modeling,Packet Capture,Streaming Graphs
Conference
2377-6943
ISBN
Citations 
PageRank 
978-1-6654-9787-9
0
0.34
References 
Authors
22
12
Name
Order
Citations
PageRank
Ivan Kawaminami100.34
Arminda Estrada200.34
Youssef Elsakkary300.34
Hayden Jananthan4144.78
Aydin Buluc5105767.49
TIMOTHY A. DAVIS61447144.19
Daniel Grant710.69
Michael J. Jones811341927.21
Chad Meiners951.75
Andrew Morris1010.69
Sandeep Pisharody1151.75
Jeremy Kepner1260661.58