Title
Clustering-Based Scalable Indexing for Multi-party Privacy-Preserving Record Linkage
Abstract
The identification of common sets of records in multiple databases has become an increasingly important subject in many application areas, including banking, health, and national security. Often privacy concerns and regulations prevent the owners of the databases from sharing any sensitive details of their records with each other, and with any other party. The linkage of records in multiple databases while preserving privacy and confidentiality is an emerging research discipline known as privacy-preserving record linkage (PPRL). We propose a novel two-step indexing (blocking) approach for PPRL between multiple (more than two) parties. First, we generate small mini-blocks using a multi-bit Bloom filter splitting method and second we merge these mini-blocks based on their similarity using a novel hierarchical canopy clustering technique. An empirical study conducted with large datasets of up-to one million records shows that our approach is scalable with the size of the datasets and the number of parties, while providing better privacy than previous multi-party indexing approaches.
Year
DOI
Venue
2015
10.1007/978-3-319-18032-8_43
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PART II
Keywords
Field
DocType
Hierarchical canopy clustering,Bloom filters,Scalability
Canopy clustering algorithm,Record linkage,Data mining,Bloom filter,Confidentiality,Computer science,Search engine indexing,Cluster analysis,Empirical research,Scalability
Conference
Volume
ISSN
Citations 
9078
0302-9743
6
PageRank 
References 
Authors
0.47
25
3
Name
Order
Citations
PageRank
Thilina Ranbaduge1123.64
Dinusha Vatsalan220919.57
Peter Christen31697107.21