Title
Identifying, Collecting, and Monitoring Personally Identifiable Information: From the Dark Web to the Surface Web
Abstract
Personally identifiable information (PII) has become a major target of cyber-attacks, causing severe losses to data breach victims. To protect data breach victims, researchers focus on collecting exposed PII to assess privacy risk and identify at-risk individuals. However, existing studies mostly rely on exposed PII collected from either the dark web or the surface web. Due to the wide exposure of PII on both the dark web and surface web, collecting from only the dark web or the surface web could result in an underestimation of privacy risk. Despite its research and practical value, jointly collecting PII from both sources is a non-trivial task. In this paper, we summarize our effort to systematically identify, collect, and monitor a total of 1,212,004,819 exposed PII records across both the dark web and surface web. Our effort resulted in 5.8 million stolen SSNs, 845,000 stolen credit/debit cards, and 1.2 billion stolen account credentials. From the surface web, we identified and collected over 1.3 million PII records of the victims whose PII is exposed on the dark web. To the best of our knowledge, this is the largest academic collection of exposed PII, which, if properly anonymized, enables various privacy research inquiries, including assessing privacy risk and identifying at-risk populations.
Year
DOI
Venue
2020
10.1109/ISI49825.2020.9280540
2020 IEEE International Conference on Intelligence and Security Informatics (ISI)
Keywords
DocType
ISBN
PII,privacy,data breach,dark web,surface web,data collection
Conference
978-1-7281-8801-0
Citations 
PageRank 
References 
0
0.34
0
Authors
9
Name
Order
Citations
PageRank
Yizhi Liu100.34
Fang Yu Lin200.34
Zara Ahmad-Post300.34
Mohammadreza Barouni-ebrahimi4124.84
N. Zhang55432.13
James Lee Hu601.35
Jingyu Xin700.34
Weifeng Li8335.48
Hsinchun Chen99569813.33