Title
Linking Personally Identifiable Information from the Dark Web to the Surface Web: A Deep Entity Resolution Approach
Abstract
The information privacy of the Internet users has become a major societal concern. The rapid growth of online services increases the risk of unauthorized access to Personally Identifiable Information (PII) of at-risk populations, who are unaware of their PII exposure. To proactively identify online at-risk populations and increase their privacy awareness, it is crucial to conduct a holistic privacy risk assessment across the internet. Current privacy risk assessment studies are limited to a single platform within either the surface web or the dark web. A comprehensive privacy risk assessment requires matching exposed PII on heterogeneous online platforms across the surface web and the dark web. However, due to the incompleteness and inaccuracy of PII records in each platform, linking the exposed PII to users is a non-trivial task. While Entity Resolution (ER) techniques can be used to facilitate this task, they often require ad-hoc, manual rule development and feature engineering. Recently, Deep Learning (DL)-based ER has outperformed manual entity matching rules by automatically extracting prominent features from incomplete or inaccurate records. In this study, we enhance the existing privacy risk assessment with a DL-based ER method, namely Multi-Context Attention (MCA), to comprehensively evaluate individuals' PII exposure across the different online platforms in the dark web and surface web. Evaluation against benchmark ER models indicates the efficacy of MCA. Using MCA on a random sample of data breach victims in the dark web, we are able to identify 4.3% of the victims on the surface web platforms and calculate their privacy risk scores.
Year
DOI
Venue
2020
10.1109/ICDMW51313.2020.00072
2020 International Conference on Data Mining Workshops (ICDMW)
Keywords
DocType
ISSN
Privacy,PII,Data breach,Dark web,Surface web,Data collection
Conference
2375-9232
ISBN
Citations 
PageRank 
978-1-7281-9013-6
0
0.34
References 
Authors
0
9
Name
Order
Citations
PageRank
Fang Yu Lin100.34
Yizhi Liu200.34
Mohammadreza Barouni-ebrahimi3124.84
Zara Ahmad-Post400.34
James Lee Hu501.35
Jingyu Xin600.34
Sagar Samtani742.42
Weifeng Li8335.48
Hsinchun Chen99569813.33