Title
Easy Data Augmentation for Improved Malware Detection: A Comparative Study
Abstract
Artificial data generation is important for improving research outcomes when using deep learning. As one of the most popular and promising generative models, the variational auto-encoder (VAE) model generates synthetic data for training classifiers more accurately. Artificial data can be generated also via easy data augmentation (EDA) techniques. EDA is a simple method used to boost the performance of text classification tasks, and unlike generative models such as VAE, it does not require model training. Malware detection is a task of determining whether there is malicious software in the host system and diagnosing the type of attack. Without an appropriate amount of training data, the detection efficiency of malicious programs decreases. In this study, EDA was applied to malware detection, and two artificial data generation methods were compared. Using both methods, artificial training data to be used for malware detection were generated, and the long short-term memory recurrent neural network (LSTM RNN) based malware detection classifier was boosted. Experiment results show that when the synthetic malware sample generated by EDA was added to the training data, the accuracy of LSTM RNN classifier improved by 1.76% as compared to the 0.98% improvement by VAE. In addition, EDA could generate malware training data, without requiring a separate training process, 10 times faster than VAE. Further, we performed extensive ablation studies conducted and suggested parameters for practical use.
Year
DOI
Venue
2021
10.1109/BigComp51126.2021.00048
2021 IEEE International Conference on Big Data and Smart Computing (BigComp)
Keywords
DocType
ISSN
Easy data augmentation,Variational Auto-Encoders,Deep Learning,Long short-term memory recurrent neural network,Malware Detection,Data Augmentation
Conference
2375-933X
ISBN
Citations 
PageRank 
978-1-7281-8925-3
0
0.34
References 
Authors
0
2
Name
Order
Citations
PageRank
Jangseong Bae100.34
Changki Lee227926.18