Title
Accurate privacy-preserving record linkage for databases with missing values
Abstract
Privacy-preserving record linkage is the process of matching records that refer to the same entity across sensitive databases held by different organisations. This process is often challenging because no unique entity identifiers, such as social security numbers, are available in the databases to be linked. Therefore, quasi-identifying attributes such as names and addresses, are required to identify records that are similar and likely refer to the same entity. Such quasi-identifiers are however often not allowed to be shared between organisations due to privacy and confidentiality concerns. Besides variations and errors in the values used for linking, quasi-identifiers can have missing values. A popular approach to link sensitive data in a privacy-preserving way is to encode quasi-identifying values into Bloom filters, bit vectors that allow approximate similarities between values to be calculated. However, with existing Bloom filter encoding approaches missing values can lead to missed true matches because they affect the similarities calculated between Bloom filters. In this paper we propose a novel approach to consider missing values in privacy-preserving record linkage by adapting Bloom filter encoding based on the patterns of missingness identified in the databases to be linked. We build a lattice structure of missingness patterns, and then generate partitions of Bloom filters over this lattice. In each partition the non-missing encoded quasi-identifying attributes are assigned different weights during the Bloom filter generation process. This results in more accurate similarity calculation and better linkage quality. To improve the privacy of our approach, each partition is encoded independently which prevents both dictionary and frequency-based attacks. We evaluate our approach on large databases that contain different amounts and patterns of missing values, showing that it can substantially outperform both Bloom filter encoding that does not consider missing values, and an earlier Bloom filter based approach for linking sensitive databases that do contain missing values.
Year
DOI
Venue
2022
10.1016/j.is.2021.101959
Information Systems
Keywords
DocType
Volume
Missing data,Privacy,Entity resolution,Data linkage,Bloom filter encoding
Journal
106
ISSN
Citations 
PageRank 
0306-4379
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Sirintra Vaiwsri100.34
Thilina Ranbaduge2123.64
Peter Christen31697107.21
Rainer Schnell400.68