Title
Embedded Malware Detection Using Markov n-Grams
Abstract
Embedded malware is a recently discovered security threat that allows malcode to be hidden inside a benign file. It has been shown that embedded malware is not detected by commercial antivirus software even when the malware signature is present in the antivirus database. In this paper, we present a novel anomaly detection scheme to detect embedded malware. We first analyze byte sequences in benign files to show that benign files' data generally exhibit a 1-st order dependence structure. Consequently, conditional n-grams provide a more meaningful representation of a file's statistical properties than traditional n-grams. To capture and leverage this correlation structure for embedded malware detection, we model the conditional distributions as Markov n-grams. For embedded malware detection, we use an information-theoretic measure, called entropy rate, to quantify changes in Markov n-gram distributions observed in a file. We show that the entropy rate of Markov n-grams gets significantly perturbed at malcode embedding locations, and therefore can act as a robust feature for embedded malware detection. We evaluate the proposed Markov n-gram detector on a comprehensive malware dataset consisting of more than 37,000 malware samples and 1,800 benign samples of six well-known filetypes. We show that the Markov n-gram detector provides better detection and false positive rates than the only existing embedded malware detection scheme.
Year
DOI
Venue
2008
10.1007/978-3-540-70542-0_5
DIMVA
Keywords
Field
DocType
embedded malware detection scheme,markov n-grams,embedded malware detection,better detection,benign file,embedded malware,malware signature,comprehensive malware dataset,entropy rate,malware detection,malware sample,anomaly detection,conditional distribution,false positive rate
Anomaly detection,Data mining,Byte,Entropy rate,Conditional probability distribution,Embedding,Computer science,Markov chain,Software,Malware
Conference
Volume
ISSN
Citations 
5137
0302-9743
34
PageRank 
References 
Authors
1.83
4
3
Name
Order
Citations
PageRank
M. Zubair Shafiq154643.41
Syed Ali Khayam245033.86
Muddassar Farooq3122183.47