Title
SankeyVis: Visualizing active relationship from emails based on multiple dimensions and topic classification methods
Abstract
The explosive growth of email has led to the rapid development of e-mail-based forensics, and at the same time, it has brought enormous challenges. As a special kind of data, email consists of structured metadata and unstructured email Body. Most attention is paid to visualization forensics of metadata at present, which is more focused on the mining of social network relationships between senders and recipients. These methods limit the range of email visualization forensics. Visual forensics of semantic analysis of the email body is relatively rare and difficult to connect semantic analysis with visualization. In recent years, the booming development of machine learning has extended the focus of forensics to the email body. This paper proposed SankeyVis: a visualization model for email forensics of active relation based on multiple dimensions and LDA topic classification methods, focusing on mining social relationships and semantic patterns in emails. SankeyVis conducts forensic work from the four data attributes of the email, “From,” “To,” “Date,” and “Body,” and the data is divided into two parts, the email header and email body according to the structure. The email header is used to get address pair with the working relationship after selecting and for the email body. Then introduced the Latent Dirichlet Allocation model to classify the topic of the email body discussed and adopt the adaption Sankey diagram to conduct forensic work from the topic semantic. It is proved that tested well by adapting to the Enron corpus. SankeyVis integrates structured and unstructured data in the visualization of email forensics, achieving visual forensics of email content. It breaks the limitations of dimensions and supports adding more than four attributes for forensics, extending the breadth of email forensics. SankeyVis reveals the topics of email senders and recipients with active relationships discussed at different time units and supports for forensics of email content to varying levels of relationships, extending the depth of email forensics.
Year
DOI
Venue
2020
10.1016/j.fsidi.2020.300981
Forensic Science International: Digital Investigation
Keywords
DocType
Volume
Digital email forensics,Social relationship,Visualization,LDA Model,Semantic analysis,Sankey diagram
Journal
35
ISSN
Citations 
PageRank 
2666-2817
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Yong Fang119131.43
Cuirong Zhao200.34
Cheng Huang301.01
Liang Liu416340.93