Title
Malware clustering using suffix trees.
Abstract
Clustering is an important problem in malware research, as the number of malicious samples that appear every day makes manual analysis impractical. Although these samples belong to a limited number of malware families, it is difficult to categorize them automatically as obfuscation is involved. By extracting relevant features we can apply clustering algorithms, then only analyze a couple of representatives from each cluster. However, classic clustering algorithms that compute the similarity between each pair of samples are slow when a large collection is involved. In this paper, the features will be strings of operation codes extracted from the binary code of each sample. With a modified suffix tree data structure we can find long enough substrings that correspond to portions of a program's code. These substrings must be filtered against a database of known substrings so that common library code will be ignored. The items that have common substrings above a certain threshold will be grouped into the same cluster. Our algorithm was tested with data extracted from real-world malware and constructed quality clusters.
Year
DOI
Venue
2016
10.1007/s11416-014-0227-6
J. Computer Virology and Hacking Techniques
Field
DocType
Volume
Edit distance,Malware research,Data mining,Data structure,Substring,Tree traversal,Computer science,Suffix tree,Cluster analysis,Malware
Journal
12
Issue
ISSN
Citations 
1
2263-8733
1
PageRank 
References 
Authors
0.35
14
3
Name
Order
Citations
PageRank
Ciprian Oprisa1165.48
George Cabau241.48
Gheorghe Sebestyen356.25