Title
Malware detection using statistical analysis of byte-level file content
Abstract
Commercial anti-virus software are unable to provide protection against newly launched (a.k.a "zero-day") malware. In this paper, we propose a novel malware detection technique which is based on the analysis of byte-level file content. The novelty of our approach, compared with existing content based mining schemes, is that it does not memorize specific byte-sequences or strings appearing in the actual file content. Our technique is non-signature based and therefore has the potential to detect previously unknown and zero-day malware. We compute a wide range of statistical and information-theoretic features in a block-wise manner to quantify the byte-level file content. We leverage standard data mining algorithms to classify the file content of every block as normal or potentially malicious. Finally, we correlate the block-wise classification results of a given file to categorize it as benign or malware. Since the proposed scheme operates at the byte-level file content; therefore, it does not require any a priori information about the filetype. We have tested our proposed technique using a benign dataset comprising of six different filetypes --- DOC, EXE, JPG, MP3, PDF and ZIP and a malware dataset comprising of six different malware types --- backdoor, trojan, virus, worm, constructor and miscellaneous. We also perform a comparison with existing data mining based malware detection techniques. The results of our experiments show that the proposed nonsignature based technique surpasses the existing techniques and achieves more than 90% detection accuracy.
Year
DOI
Venue
2009
10.1145/1599272.1599278
KDD Workshop on CyberSecurity and Intelligence Informatics
Keywords
Field
DocType
actual file content,zero-day malware,malware detection,existing technique,different malware type,proposed technique,statistical analysis,novel malware detection technique,malware dataset,byte-level file content,malware detection technique,file content,forensics,data mining
Byte,Data mining,Computer science,A priori and a posteriori,Software,Backdoor,Trojan,Data mining algorithm,Malware,Statistical analysis
Conference
Citations 
PageRank 
References 
48
1.86
15
Authors
3
Name
Order
Citations
PageRank
S. Momina Tabish11196.05
M. Zubair Shafiq254643.41
Muddassar Farooq3122183.47