Title
Efficient Privacy-Aware Record Integration.
Abstract
The integration of information dispersed among multiple repositories is a crucial step for accurate data analysis in various domains. In support of this goal, it is critical to devise procedures for identifying similar records across distinct data sources. At the same time, to adhere to privacy regulations and policies, such procedures should protect the confidentiality of the individuals to whom the information corresponds. Various private record linkage (PRL) protocols have been proposed to achieve this goal, involving secure multi-party computation (SMC) and similarity preserving data transformation techniques. SMC methods provide secure and accurate solutions to the PRL problem, but are prohibitively expensive in practice, mainly due to excessive computational requirements. Data transformation techniques offer more practical solutions, but incur the cost of information leakage and false matches. In this paper, we introduce a novel model for practical PRL, which 1) affords controlled and limited information leakage, 2) avoids false matches resulting from data transformation. Initially, we partition the data sources into blocks to eliminate comparisons for records that are unlikely to match. Then, to identify matches, we apply an efficient SMC technique between the candidate record pairs. To enable efficiency and privacy, our model leaks a controlled amount of obfuscated data prior to the secure computations. Applied obfuscation relies on differential privacy which provides strong privacy guarantees against adversaries with arbitrary background knowledge. In addition, we illustrate the practical nature of our approach through an empirical analysis with data derived from public voter records.
Year
DOI
Venue
2013
10.1145/2452376.2452398
EDBT
Keywords
Field
DocType
record linkage,information leakage,data transformation technique,data source,limited information leakage,privacy,distinct data source,efficient privacy-aware record integration,data transformation,differential privacy,obfuscated data,information corresponds,accurate data analysis,security,biomedical research,performance,bioinformatics
Data mining,Record linkage,Information leakage,Differential privacy,Confidentiality,Computer science,Obfuscation,Database,Privacy law,Computation
Conference
Citations 
PageRank 
References 
14
0.66
22
Authors
6
Name
Order
Citations
PageRank
Mehmet Kuzu131013.37
Murat Kantarcioglu22470168.03
Ali Inan31187.22
Elisa Bertino4140252128.50
Elizabeth Durham5813.18
Bradley Malin61302113.97