Title
De-anonymizing private data by matching statistics.
Abstract
Recent research has illustrated privacy breaches that can be effected on an anonymized dataset by an attacker who has access to auxiliary information about the users. Most of these attack strategies rely on the uniqueness of specific aspects of the users' data - e.g., observing a mobile user at just a few points on the time-location space are sufficient to uniquely identify him/her from an anonymized set of users. In this work, we consider de-anonymization attacks on anonymized summary statistics in the form of histograms. Such summary statistics are useful for many applications that do not need knowledge about exact user behavior. We consider an attacker who has access to an anonymized set of histograms of K users' data and an independent set of data belonging to the same users. Modeling the users' data as i.i.d., we study the composite hypothesis testing problem of identifying the correct matching between the anonymized histograms from the first set and the user data from the second. We propose a Generalized Likelihood Ratio Test as a solution to this problem and show that the solution can be identified using a minimum weight matching algorithm on an K x K complete bipartite weighted graph. We show that a variant of this solution is asymptotically optimal as the data lengths are increased. We apply the algorithm on mobility traces of over 1000 users on EPFL campus collected during two weeks and show that up to 70% of the users can be correctly matched. These results show that anonymized summary statistics of mobility traces themselves contain a significant amount of information that can be used to uniquely identify users by an attacker who has access to auxiliary information about the statistics.
Year
DOI
Venue
2013
10.1109/Allerton.2013.6736722
Allerton
Keywords
DocType
ISSN
data privacy,graph theory,mobile computing,statistical testing,EPFL campus,anonymized histogram set access,anonymized summary statistics,anonymized user data set,asymptotic optimality,auxiliary information access,complete bipartite weighted graph,composite hypothesis testing problem,data lengths,generalized likelihood ratio test,histograms,independent data set,minimum weight matching algorithm,mobile user,mobility traces,privacy breach,private data de-anonymization attack strategies,time-location space,user data modeling
Conference
2474-0195
Citations 
PageRank 
References 
3
0.41
0
Authors
2
Name
Order
Citations
PageRank
Jayakrishnan Unnikrishnan128021.34
Farid Movahedi Naini2332.71