Textual analysis of traitor-based dataset through semi supervised machine learning - Citegraph

Paper Info

Title
Textual analysis of traitor-based dataset through semi supervised machine learning

Abstract
Insider threats are one of the most challenging and growing security threats which the government agencies, organizations, and institutions face. In such scenarios, malicious (red) activities are performed by the authorized individuals within the company. Because of which, an insider threat has become a taxing and difficult task to identify among other attacks. Along with other monitoring parameters; email logs play a vital role in many research areas such as stalking Insider Threat involving Collaborating Traitors, Textual Analysis, and Social Media exploration. This paper presents a semi-supervised machine learning framework which embraces the pre-processing and classification techniques together for unlabeled dataset i.e. emails. Enron Corporation dataset has been used for experiments and TWOS for evaluation of the proposed framework. Initially, dataset is transformed into vector form using Term Frequency–Inverse Document Frequency (TF–IDF). Thereafter, K-Means is used to classify emails based on message content. Finally, Machine Learning algorithm Decision Tree (DT) is applied to classify the malicious activities. The proposed framework has also been tested with other algorithms such as Logistic Regression (LR), Naive Bayes (NB), KNN, Support Vector Machine (SVM), Random Forest (RF) and Neural Network (NN). However, Decision Tree (DT) combined with pre-processing steps has given the desired results with 99.96% Accuracy and 0.994 AUC for identification of malicious content.

Year	DOI	Venue
2021	10.1016/j.future.2021.06.036	Future Generation Computer Systems
Keywords	DocType	Volume
Malicious emails,Insider threat,Machine learning,Enron dataset,TWOS dataset,Text classification	Journal	125
ISSN	Citations	PageRank
0167-739X	1	0.34
References	Authors
0	5

Authors (5 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Faisal Janjua	1	1	0.34
Asif Masood	2	137	12.91
Haider Abbas	3	391	43.88
Imran Rashid	4	16	4.05
Malik Muhammad Zaki Murtaza Khan	5	1	0.68

1