Unknown malcode detection and the imbalance problem - Citegraph

Paper Info

Title
Unknown malcode detection and the imbalance problem

Abstract
The recent growth in network usage has motivated the creation of new malicious code for various purposes. Today's signature-based antiviruses are very accurate for known malicious code, but can not detect new malicious code. Recently, classification algorithms were used successfully for the detection of unknown malicious code. But, these studies involved a test collection with a limited size and the same malicious: benign file ratio in both the training and test sets, a situation which does not reflect real-life conditions. We present a methodology for the detection of unknown malicious code, which examines concepts from text categorization, based on n-grams extraction from the binary code and feature selection. We performed an extensive evaluation, consisting of a test collection of more than 30,000 files, in which we investigated the class imbalance problem. In real-life scenarios, the malicious file content is expected to be low, about 10% of the total files. For practical purposes, it is unclear as to what the corresponding percentage in the training set should be. Our results indicate that greater than 95% accuracy can be achieved through the use of a training set that has a malicious file content of less than 33.3%.

Year	DOI	Venue
2009	10.1007/s11416-009-0122-8	Journal in Computer Virology
Keywords	Field	DocType
imbalance problem.,classification,machine learning,unknown malicious code detection,feature selection	Training set,Data mining,Network usage,Feature selection,Computer science,Computer security,Support vector machine,Binary code,Computer virus,Statistical classification,Semantics	Journal
Volume	Issue	ISSN
5	4	2263-8733
Citations	PageRank	References
29	0.98	21
Authors
6

Authors (6 rows)

Cited by (29 rows)

References (21 rows)

Name	Order	Citations	PageRank
Robert Moskovitch	1	729	39.62
Dima Stopel	2	57	3.25
Clint Feher	3	206	9.12
Nir Nissim	4	199	19.42
Nathalie Japkowicz	5	2581	182.43
Yuval Elovici	6	2583	204.53

1