Title
Data Loss Prevention For Cross-Domain Instant Messaging
Abstract
This paper proposes a cascading classifier for inspecting and validating the payload of chat messages in (military) instant messaging. The first step in the cascading classifier pipeline is an anomaly detection-based method whose purpose is to ensure that the message channel is not used to exfiltrate non-message data such as images, documents, binary files or encrypted content. Messages that pass the filtering phase then proceed to have their content analyzed for the presence of known sensitive information. This data loss prevention step is enhanced by incorporating an author profile signal that assesses the validity of the claimed authorship by capturing the stylometric signature embedded in each user's past message stream. The hypothesis being that the inference and subsequent inclusion of latent author traits such as gender, age and ethnicity will aid the data loss prevention solution by reducing the number of incorrect classifications. Experiments were conducted using message traffic that was generated during a field-training exercise conducted by members of the armed forces, as well as an internal repository of classified documents and a myriad of non-message based data sources. The results demonstrated that our proposed traffic-filtering classifier is successful in distinguishing between legitimate and illegitimate traffic. Further, the experiments showed that constructing authorship verification models, using sparse messages as a training set, is feasible and that including this signal in the data loss prevention solution leads to a significant increase in the predictive performance for the cross-domain messaging setting.
Year
Venue
Field
2017
2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI)
Anomaly detection,Data mining,Data loss,Inference,Computer science,Support vector machine,Communication channel,Encryption,Classifier (linguistics),Information sensitivity
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Kyrre Wahl Kongsgård171.27
Nils Agne Nordbotten2905.78
Federico Mancini3789.79
Paal E. Engelstad428034.38