Title
Accounting for Intruder Uncertainty Due to Sampling When Estimating Identification Disclosure Risks in Partially Synthetic Data
Abstract
Partially synthetic data comprise the units originally surveyed with some collected values, such as sensitive values at high risk of disclosure or values of key identifiers, replaced with multiple draws from statistical models. Because the original records remain on the file, intruders may be able to link those records to external databases, even though values are synthesized. We illustrate how statistical agencies can evaluate the risks of identification disclosures before releasing such data. We compute risk measures when intruders know who is in the sample and when the intruders do not know who is in the sample. We use classification and regression trees to synthesize data from the U.S. Current Population Survey.
Year
DOI
Venue
2008
10.1007/978-3-540-87471-3_19
Privacy in Statistical Databases
Keywords
Field
DocType
intruder uncertainty,partially synthetic data,synthetic data.,cart,risk measure,high risk,estimating identification disclosure risks,key identifiers,risk,u.s. current population survey,multiple draw,identification disclosure,statistical model,disclosure,synthetic data,statistical agency,external databases
Current Population Survey,Data mining,Identifier,Regression,Computer science,Synthetic data,Statistical model,Sampling (statistics),Statistics
Conference
Volume
ISSN
Citations 
5262
0302-9743
11
PageRank 
References 
Authors
1.02
5
2
Name
Order
Citations
PageRank
Jörg Drechsler1395.15
Jerome P. Reiter221622.12