Malware clustering using suffix trees. - Citegraph

Paper Info

Title
Malware clustering using suffix trees.

Abstract
Clustering is an important problem in malware research, as the number of malicious samples that appear every day makes manual analysis impractical. Although these samples belong to a limited number of malware families, it is difficult to categorize them automatically as obfuscation is involved. By extracting relevant features we can apply clustering algorithms, then only analyze a couple of representatives from each cluster. However, classic clustering algorithms that compute the similarity between each pair of samples are slow when a large collection is involved. In this paper, the features will be strings of operation codes extracted from the binary code of each sample. With a modified suffix tree data structure we can find long enough substrings that correspond to portions of a program's code. These substrings must be filtered against a database of known substrings so that common library code will be ignored. The items that have common substrings above a certain threshold will be grouped into the same cluster. Our algorithm was tested with data extracted from real-world malware and constructed quality clusters.

Year	DOI	Venue
2016	10.1007/s11416-014-0227-6	J. Computer Virology and Hacking Techniques
Field	DocType	Volume
Edit distance,Malware research,Data mining,Data structure,Substring,Tree traversal,Computer science,Suffix tree,Cluster analysis,Malware	Journal	12
Issue	ISSN	Citations
1	2263-8733	1
PageRank	References	Authors
0.35	14	3

Authors (3 rows)

Cited by (1 rows)

References (14 rows)

Name	Order	Citations	PageRank
Ciprian Oprisa	1	16	5.48
George Cabau	2	4	1.48
Gheorghe Sebestyen	3	5	6.25

1