Abstract | ||
---|---|---|
We have collected several large-scale datasets in a number of passive measurement projects on an Internet backbone link belonging to a national university network. The datasets have been used in different studies such as in general classification and characterization of properties of Internet traffic, in network security projects detecting and classifying malicious traffic and hosts, and in studies of network-level properties of unsolicited e-mail (spam) traffic. The Antispam dataset alone contains traffic between more than 10 million e-mail addresses. In this paper we describe our datasets, the data collection methodology including experiences in collecting and processing data on a large scale. We have in particular selected a dataset belonging to an anti-spam project to show how a practical analysis of highly privacy-sensitive data can be done, in this case containing complete e-mail traffic. Not only do we show that it is possible to collect large datasets, we also show how to solve different issues regarding user privacy and give experiences from how to work with large datasets. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1145/1978672.1978680 | BADGERS@EuroSys |
Keywords | Field | DocType |
internet traffic,large-scale multi-purpose datasets,internet backbone link,processing data,malicious traffic,large datasets,data collection methodology,privacy-sensitive data,complete e-mail traffic,large scale,million e-mail address,large-scale datasets,data collection,network security,spam | World Wide Web,Computer science,Network security,Internet measurement,Data collection methodology,Internet backbone,Internet traffic,User privacy | Conference |
Citations | PageRank | References |
5 | 0.48 | 15 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Farnaz Moradi | 1 | 33 | 6.22 |
Magnus Almgren | 2 | 270 | 39.17 |
Wolfgang John | 3 | 182 | 14.92 |
Tomas Olovsson | 4 | 188 | 21.68 |
Philippas Tsigas | 5 | 1200 | 99.58 |