Title
A Semi-supervised Learning Methodology for Malware Categorization using Weighted Word Embeddings
Abstract
Due to the vertiginous growth of malicious actors, malware has been crafted, distributed and propagated around the world with new and sophisticated techniques. Classical malware detection procedures, mostly based on signatures and heuristic searches, are now being replaced with machine learning-based (ML) solutions. However, some challenges are still present. Firstly, supervised approaches use anti-virus tags to create hand-crafted datasets, resulting in a lack of taxonomy and uncertainty if a given observation is classified with a proper label. Secondly, off-line and feed-forward approaches may result in complex and time consuming feature extraction tasks. In this work, we propose a novel method that reinforces malware characterization by capturing rich relevance and contextual patterns into an n-dimensional weighted word embedding vector (WEV) space. Results prove that by clustering similar WEVs via unsupervised learning, malware can be categorized into four major families, improving detection with less resources.
Year
DOI
Venue
2019
10.1109/EuroSPW.2019.00033
2019 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)
Keywords
Field
DocType
malware,windows-api,machine-learning,word2vec,clustering
Categorization,Semi-supervised learning,Computer science,Theoretical computer science,Feature extraction,Unsupervised learning,Artificial intelligence,Word2vec,Word embedding,Malware,Cluster analysis,Machine learning
Conference
ISBN
Citations 
PageRank 
978-1-7281-3027-9
1
0.34
References 
Authors
14
7