Title
An Empirical Evaluation of Automated Machine Learning Techniques for Malware Detection
Abstract
ABSTRACTNowadays, it is increasingly difficult even for a machine learning expert to incorporate all of the recent best practices into their modeling due to the fast development of state-of-the-art machine learning techniques. For the applications that handle big data sets, the complexity of the problem of choosing the best performing model with the best hyper-parameter setting becomes harder. In this work, we present an empirical evaluation of automated machine learning (AutoML) frameworks or techniques that aim to optimize hyper-parameters for machine learning models to achieve the best achievable performance. We apply AutoML techniques to the malware detection problem, which requires achieving the true positive rate as high as possible while reducing the false positive rate as low as possible. We adopt two AutoML frameworks, namely AutoGluon-Tabular and Microsoft Neural Network Intelligence (NNI) to optimize hyper-parameters of a Light Gradient Boosted Machine (LightGBM) model for classifying malware samples. We carry out extensive experiments on two data sets. The first data set is a publicly available data set (EMBER data set), that has been used as a benchmarking data set for many malware detection works. The second data set is a private data set we have acquired from a security company that provides recently-collected malware samples. We provide empirical analysis and performance comparison of the two AutoML frameworks. The experimental results show that AutoML frameworks could identify the set of hyper-parameters that significantly outperform the performance of the model with the known best performing hyper-parameter setting and improve the performance of a LightGBM classifier with respect to the true positive rate from $86.8%$ to $90%$ at $0.1%$ of false positive rate on EMBER data set and from $80.8%$ to $87.4%$ on the private data set.
Year
DOI
Venue
2021
10.1145/3445970.3451155
CODASPY
DocType
Citations 
PageRank 
Conference
1
0.39
References 
Authors
0
3
Name
Order
Citations
PageRank
Partha Pratim Kundu110.39
Lux Anatharaman210.39
Tram Truong-Huu310.39