Title
Scalable Blocking for Privacy Preserving Record Linkage
Abstract
When dealing with sensitive and personal user data, the process of record linkage raises privacy issues. Thus, privacy preserving record linkage has emerged with the goal of identifying matching records across multiple data sources while preserving the privacy of the individuals they describe. The task is very resource demanding, considering the abundance of available data, which, in addition, are often dirty. Blocking techniques are deployed prior to matching to prune out unlikely to match candidate records so as to reduce processing time. However, when scaling to large datasets, such methods often result in quality loss. To this end, we propose Multi-Sampling Transitive Closure for Encrypted Fields (MS-TCEF), a novel privacy preserving blocking technique based on the use of reference sets. Our new method effectively prunes records based on redundant assignments to blocks, providing better fault-tolerance and maintaining result quality while scaling linearly with respect to the dataset size. We provide a theoretical analysis on the method's complexity and show how it outperforms state-of-the-art privacy preserving blocking techniques with respect to both recall and processing cost.
Year
DOI
Venue
2015
10.1145/2783258.2783290
ACM Knowledge Discovery and Data Mining
Keywords
Field
DocType
Private blocking,performance,reference sets
Data mining,Record linkage,Multiple data,Computer science,Encryption,Blocking techniques,Transitive closure,Scalability
Conference
Citations 
PageRank 
References 
1
0.35
17
Authors
3
Name
Order
Citations
PageRank
Alexandros Karakasidis115311.58
Georgia Koloniari222016.49
Vassilios S. Verykios3140296.96