Title
BitShred: feature hashing malware for scalable triage and semantic analysis
Abstract
The sheer volume of new malware found each day is growing at an exponential pace. This growth has created a need for automatic malware triage techniques that determine what malware is similar, what malware is unique, and why. In this paper, we present BitShred, a system for large-scale malware similarity analysis and clustering, and for automatically uncovering semantic inter- and intra-family relationships within clusters. The key idea behind BitShred is using feature hashing to dramatically reduce the high-dimensional feature spaces that are common in malware analysis. Feature hashing also allows us to mine correlated features between malware families and samples using co-clustering techniques. Our evaluation shows that BitShred speeds up typical malware triage tasks by up to 2,365x and uses up to 82x less memory on a single CPU, all with comparable accuracy to previous approaches. We also develop a parallelized version of BitShred, and demonstrate scalability within the Hadoop framework.
Year
DOI
Venue
2011
10.1145/2046707.2046742
ACM Conference on Computer and Communications Security
Keywords
Field
DocType
hadoop framework,typical malware triage task,correlated feature,malware analysis,semantic analysis,bitshred speed,new malware,scalable triage,high-dimensional feature space,malware family,automatic malware triage technique,large-scale malware similarity analysis,co clustering,feature hashing,feature space
Data mining,Similarity analysis,Computer science,Feature hashing,Triage,Biclustering,Cluster analysis,Malware,Scalability,Malware analysis
Conference
Citations 
PageRank 
References 
112
3.57
27
Authors
3
Search Limit
100112
Name
Order
Citations
PageRank
Jiyong Jang129716.23
David Brumley22940142.75
Shobha Venkataraman3102751.93