Abstract | ||
---|---|---|
Generic computer virus detection is the need of the hour as most commercial antivirus software fail to detect unknown and new viruses. Motivated by the success of datamining/machine learning techniques in intrusion detection systems, recent research in detecting malicious executables is directed towards devising efficient non-signature-based techniques that can profile the program characteristics from a set of training examples. Byte sequences and byte n-grams are considered to be basis of feature extraction. But as the number of n-grams is going to be very large, several methods of feature selections were proposed in literature. A recent report on use of information gain based feature selection has yielded the best-known result in classifying malicious executables from benign ones. We observe that information gain models the presence of n-gram in one class and its absence in the other. Through a simple example we show that this may lead to erroneous results. In this paper, we describe a new feature selection measure, class-wise document frequency of byte n-grams. We empirically demonstrate that the proposed method is a better method for feature selection. For detection, we combine several classifiers using Dempster Shafer Theory for better classification accuracy instead of using any single classifier. Our experimental results show that such a scheme detects virus program far more efficiently than the earlier known methods. |
Year | DOI | Venue |
---|---|---|
2006 | 10.1007/s11416-006-0027-8 | Journal in Computer Virology |
Keywords | Field | DocType |
machine learning,feature selection,computer virus,feature extraction,information gain,dempster shafer theory,intrusion detection system | Byte,Pattern recognition,Feature selection,Computer science,Feature (computer vision),Computer virus,Feature extraction,Artificial intelligence,Classifier (linguistics),Dempster–Shafer theory,Intrusion detection system,Machine learning | Journal |
Volume | Issue | ISSN |
2 | 3 | 1772-9904 |
Citations | PageRank | References |
51 | 1.69 | 17 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
D. Krishna Sandeep Reddy | 1 | 62 | 2.42 |
Arun K. Pujari | 2 | 420 | 48.20 |