Title
Anonymization of location data does not work: a large-scale measurement study
Abstract
We examine a very large-scale data set of more than 30 billion call records made by 25 million cell phone users across all 50 states of the US and attempt to determine to what extent anonymized location data can reveal private user information. Our approach is to infer, from the call records, the "top N" locations for each user and correlate this information with publicly-available side information such as census data. For example, the measured "top 2" locations likely correspond to home and work locations, the "top 3" to home, work, and shopping/school/commute path locations. We consider the cases where those "top N" locations are measured with different levels of granularity, ranging from a cell sector to whole cell, zip code, city, county and state. We then compute the anonymity set, namely the number of users uniquely identified by a given set of "top N" locations at different granularity levels. We find that the "top 1" location does not typically yield small anonymity sets. However, the top 2 and top 3 locations do, certainly at the sector or cell-level granularity. We consider a variety of different factors that might impact the size of the anonymity set, for example the distance between the "top N" locations or the geographic environment (rural vs urban). We also examine to what extent specific side information, in particular the size of the user's social network, decrease the anonymity set and therefore increase risks to privacy. Our study shows that sharing anonymized location data will likely lead to privacy risks and that, at a minimum, the data needs to be coarse in either the time domain (meaning the data is collected over short periods of time, in which case inferring the top N locations reliably is difficult) or the space domain (meaning the data granularity is strictly higher than the cell level). In both cases, the utility of the anonymized location data will be decreased, potentially by a significant amount.
Year
DOI
Venue
2011
10.1145/2030613.2030630
MobiCom
Keywords
Field
DocType
top n location,cell level,census data,top n,anonymity set,extent anonymized location data,large-scale data,data granularity,small anonymity set,large-scale measurement study,anonymized location data,location,time domain,social network,privacy
Data mining,Social network,Information retrieval,Computer science,Computer network,k-anonymity,User information,Location data,Phone,Ranging,Granularity,Anonymity
Conference
Citations 
PageRank 
References 
122
3.57
24
Authors
2
Search Limit
100122
Name
Order
Citations
PageRank
Hui Zang1105277.25
Jean Bolot220310.69