Title
Embedding vector generation based on function call graph for effective malware detection and classification
Abstract
The surge of malware poses a huge threat to cyberspace security. The existing malware analysis methods based on machine learning mainly rely on feature engineering. These methods need to extract many handcrafted features from the malware to improve accuracy, which increases the complexity of malware analysis. In order to solve this problem, this paper proposes GEMAL, a new malware analysis method based on function call graph (FCG) and graph embedding network. FCG contains the structure information of the binary file and has been used in various research of malware analysis. Inspired by natural language processing tasks, we treat instructions as words and functions as sentences, so that we can automatically extract semantic features using the natural language processing method. We use an attention mechanism based graph embedding network to combine structural features and semantic features to generate embedding vectors of malware for automatic and efficient malware analysis. We use two datasets to test the efficiency of GEMAL. One is a self-created dataset named WUFCG, which contains 70,188 real-world samples. The other one is the public dataset of the Microsoft Malware Classification Challenge, which contains 10,868 samples. Experimental results show that GEMAL can detect real-world malware with 99.16% accuracy and classify malware families with the best accuracy of 99.81%.
Year
DOI
Venue
2022
10.1007/s00521-021-06808-8
Neural Computing and Applications
Keywords
DocType
Volume
Malware detection, Malware classification, Function call graph, Graph embedding, Attention mechanism
Journal
34
Issue
ISSN
Citations 
11
0941-0643
0
PageRank 
References 
Authors
0.34
18
4
Name
Order
Citations
PageRank
Xiao-Wang Wu100.34
Yan Wang200.34
Yong Fang319131.43
Peng Jia400.34