Abstract | ||
---|---|---|
The sheer volume of new malware found each day is growing at an exponential pace. This growth has created a need for automatic malware triage techniques that determine what malware is similar, what malware is unique, and why. In this paper, we present BitShred, a system for large-scale malware similarity analysis and clustering, and for automatically uncovering semantic inter- and intra-family relationships within clusters. The key idea behind BitShred is using feature hashing to dramatically reduce the high-dimensional feature spaces that are common in malware analysis. Feature hashing also allows us to mine correlated features between malware families and samples using co-clustering techniques. Our evaluation shows that BitShred speeds up typical malware triage tasks by up to 2,365x and uses up to 82x less memory on a single CPU, all with comparable accuracy to previous approaches. We also develop a parallelized version of BitShred, and demonstrate scalability within the Hadoop framework. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1145/2046707.2046742 | ACM Conference on Computer and Communications Security |
Keywords | Field | DocType |
hadoop framework,typical malware triage task,correlated feature,malware analysis,semantic analysis,bitshred speed,new malware,scalable triage,high-dimensional feature space,malware family,automatic malware triage technique,large-scale malware similarity analysis,co clustering,feature hashing,feature space | Data mining,Similarity analysis,Computer science,Feature hashing,Triage,Biclustering,Cluster analysis,Malware,Scalability,Malware analysis | Conference |
Citations | PageRank | References |
112 | 3.57 | 27 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jiyong Jang | 1 | 297 | 16.23 |
David Brumley | 2 | 2940 | 142.75 |
Shobha Venkataraman | 3 | 1027 | 51.93 |