Title
Privacy, accuracy, and consistency too: a holistic solution to contingency table release
Abstract
The contingency table is a work horse of official statistics, the format of reported data for the US Census, Bureau of Labor Statistics, and the Internal Revenue Service. In many settings such as these privacy is not only ethically mandated, but frequently legally as well. Consequently there is an extensive and diverse literature dedicated to the problems of statistical disclosure control in contingency table release. However, all current techniques for reporting contingency tables fall short on at leas one of privacy, accuracy, and consistency (among multiple released tables). We propose a solution that provides strong guarantees for all three desiderata simultaneously. Our approach can be viewed as a special case of a more general approach for producing synthetic data: Any privacy-preserving mechanism for contingency table release begins with raw data and produces a (possibly inconsistent) privacy-preserving set of marginals. From these tables alone-and hence without weakening privacy--we will find and output the "nearest" consistent set of marginals. Interestingly, this set is no farther than the tables of the raw data, and consequently the additional error introduced by the imposition of consistency is no more than the error introduced by the privacy mechanism itself. The privacy mechanism of [20] gives the strongest known privacy guarantees, with very little error. Combined with the techniques of the current paper, we therefore obtain excellent privacy, accuracy, and consistency among the tables. Moreover, our techniques are surprisingly efficient. Our techniques apply equally well to the logical cousin of the contingency table, the OLAP cube.
Year
DOI
Venue
2007
10.1145/1265530.1265569
PODS
Keywords
Field
DocType
consistent set,holistic solution,reported data,contingency table release,strongest known privacy guarantee,contingency table,synthetic data,raw data,additional error,excellent privacy,privacy mechanism,olap,privacy
Revenue,Data mining,Official statistics,Computer science,Raw data,Synthetic data,Contingency table,OLAP cube,Online analytical processing,Special case
Conference
Citations 
PageRank 
References 
209
14.16
19
Authors
6
Search Limit
100209
Name
Order
Citations
PageRank
Boaz Barak12563127.61
Kamalika Chaudhuri2150396.90
Cynthia Dwork39137821.87
Satyen Kale4143690.68
Frank McSherry54289288.94
Kunal Talwar64423259.79