Title
Feature reduction to speed up malware classification
Abstract
In statistical classification work, one method of speeding up the process is to use only a small percentage of the total parameter set available. In this paper, we apply this technique both to the classification of malware and the identification of malware from a set combined with cleanware. In order to demonstrate the usefulness of our method, we use the same sets of malware and cleanware as in an earlier paper. Using the statistical technique Information Gain (IG), we reduce the set of features used in the experiment from 7,605 to just over 1,000. The best accuracy obtained in the former paper using 7,605 features is 97.3% for malware versus cleanware detection and 97.4% for malware family classification; on the reduced feature set, we obtain a (best) accuracy of 94.6% on the malware versus cleanware test and 94.5% on the malware classification test. An interesting feature of the new tests presented here is the reduction in false negative rates by a factor of about 1/3 when compared with the results of the earlier paper. In addition, the speed with which our tests run is reduced by a factor of approximately 3/5 from the times posted for the original paper. The small loss in accuracy and improved false negative rate along with significant improvement in speed indicate that feature reduction should be further pursued as a tool to prevent algorithms from becoming intractable due to too much data.
Year
DOI
Venue
2011
10.1007/978-3-642-29615-4_13
NordSec
Keywords
Field
DocType
cleanware test,former paper,feature reduction,statistical classification work,earlier paper,best accuracy,reduced feature set,original paper,malware classification test,malware family classification,dynamic analysis
Data mining,Pattern recognition,Computer science,Information gain,Feature set,Artificial intelligence,Statistical classification,Malware,Speedup
Conference
Volume
ISSN
Citations 
7161
0302-9743
5
PageRank 
References 
Authors
0.53
14
3
Name
Order
Citations
PageRank
Veelasha Moonsamy11077.75
Ronghua Tian21457.90
Lynn Batten312210.42