Title
Disclosure Risk Evaluation For Fully Synthetic Categorical Data
Abstract
We present an approach for evaluating disclosure risks for fully synthetic categorical data. The basic idea is to compute probability distributions of unknown confidential data values given the synthetic data and assumptions about intruder knowledge. We use a "worst-case" scenario of an intruder knowing all but one of the records in the confidential data. To create the synthetic data, we use a Dirichlet process mixture of products of multinomial distributions, which is a Bayesian version of a latent class model. In addition to generating synthetic data with high utility, the likelihood function admits simple and convenient approximations to the disclosure risk probabilities via importance sampling. We illustrate the disclosure risk computations by synthesizing a subset of data from the American Community Survey.
Year
DOI
Venue
2014
10.1007/978-3-319-11257-2_15
PRIVACY IN STATISTICAL DATABASES, PSD 2014
Keywords
Field
DocType
Bayesian, confidentiality, Dirichlet process, disclosure, microdata
Data mining,Importance sampling,Dirichlet process,Likelihood function,Categorical variable,Computer science,Multinomial distribution,Synthetic data,Probability distribution,Statistics,Bayesian probability
Conference
Volume
ISSN
Citations 
8744
0302-9743
5
PageRank 
References 
Authors
0.61
6
3
Name
Order
Citations
PageRank
Jingchen Hu161.69
Jerome P. Reiter221622.12
Quanli Wang350.61