Title
Discovering frequent patterns in sensitive data
Abstract
Discovering frequent patterns from data is a popular exploratory technique in datamining. However, if the data are sensitive (e.g., patient health records, user behavior records) releasing information about significant patterns or trends carries significant risk to privacy. This paper shows how one can accurately discover and release the most significant patterns along with their frequencies in a data set containing sensitive information, while providing rigorous guarantees of privacy for the individuals whose information is stored there. We present two efficient algorithms for discovering the k most frequent patterns in a data set of sensitive records. Our algorithms satisfy differential privacy, a recently introduced definition that provides meaningful privacy guarantees in the presence of arbitrary external information. Differentially private algorithms require a degree of uncertainty in their output to preserve privacy. Our algorithms handle this by returning 'noisy' lists of patterns that are close to the actual list of k most frequent patterns in the data. We define a new notion of utility that quantifies the output accuracy of private top-k pattern mining algorithms. In typical data sets, our utility criterion implies low false positive and false negative rates in the reported lists. We prove that our methods meet the new utility criterion; we also demonstrate the performance of our algorithms through extensive experiments on the transaction data sets from the FIMI repository. While the paper focuses on frequent pattern mining, the techniques developed here are relevant whenever the data mining output is a list of elements ordered according to an appropriately 'robust' measure of interest.
Year
DOI
Venue
2010
10.1145/1835804.1835869
KDD
Keywords
Field
DocType
data mining output,sensitive data,transaction data set,sensitive information,typical data set,arbitrary external information,differential privacy,meaningful privacy guarantee,significant pattern,frequent pattern mining,frequent pattern,data mining,false positive,transaction data,privacy,satisfiability
Data mining,Data set,Differential privacy,Computer science,Information sensitivity,Transaction data
Conference
Citations 
PageRank 
References 
93
3.32
26
Authors
4
Name
Order
Citations
PageRank
Raghav Bhaskar11919.88
Srivatsan Laxman242121.65
Adam Smith34183229.81
Abhradeep Thakurta450028.18