Title
Improving malicious URLs detection via feature engineering: Linear and nonlinear space transformation methods
Abstract
In malicious URLs detection, traditional classifiers are challenged because the data volume is huge, patterns are changing over time, and the correlations among features are complicated. Feature engineering plays an important role in addressing these problems. To better represent the underlying problem and improve the performances of classifiers in identifying malicious URLs, this paper proposed a combination of linear and non-linear space transformation methods. For linear transformation, a two-stage distance metric learning approach was developed: first, singular value decomposition was performed to get an orthogonal space, and then a linear programming was used to solve an optimal distance metric. For nonlinear transformation, we introduced Nyström method for kernel approximation and used the revised distance metric for its radial basis function such that the merits of both linear and non-linear transformations can be utilized. 33,1622 URLs with 62 features were collected to validate the proposed feature engineering methods. The results showed that the proposed methods significantly improved the efficiency and performance of certain classifiers, such as k-Nearest Neighbor, Support Vector Machine, and neural networks. The malicious URLs’ identification rate of k-Nearest Neighbor was increased from 68% to 86%, the rate of linear Support Vector Machine was increased from 58% to 81%, and the rate of Multi-Layer Perceptron was increased from 63% to 82%. We also developed a website to demonstrate a malicious URLs detection system which uses the methods proposed in this paper. The system can be accessed at: http://url.jspfans.com.
Year
DOI
Venue
2020
10.1016/j.is.2020.101494
Information Systems
Keywords
DocType
Volume
Feature engineering,Malicious URLs detection,Nyström method,Distance metric learning,Singular value decomposition
Journal
91
ISSN
Citations 
PageRank 
0306-4379
3
0.37
References 
Authors
8
3
Name
Order
Citations
PageRank
Tie Li1131.89
Gang Kou22527191.95
Yi Peng3130378.20