Title
Parallelizing Record Linkage for Disclosure Risk Assessment
Abstract
Handling very large volumes of confidential data is becoming a common practice in many organizations such as statistical agencies. This calls for the use of protection methods that have to be validated in terms of the quality they provide. With the use of Record Linkage (RL) it is possible to compute the disclosure risk, which gives a measure of the quality of a data protection method. However, the RL methods proposed in the literature are computationally costly, which poses difficulties when frequent RL processes have to be executed on large data.Here, we propose a distributed computing technique to improve the performance of a RL process. We show that our technique not only improves the computing time of a RL process significantly, but it is also scalable in a distributed environment. Also, we show that distributed computation can be complemented with SMP based parallelization in each node increasing the final speedup.
Year
DOI
Venue
2008
10.1007/978-3-540-87471-3_16
Privacy in Statistical Databases
Keywords
Field
DocType
rl process,large volume,rl method,confidential data,parallel computing,data protection method,record linkage,disclosure risk evaluation.,protection method,parallelizing record linkage,large data,computing time,distributed computing,frequent rl process,disclosure risk assessment,distributed environment,data protection,parallel computer
Record linkage,Data mining,Confidentiality,Distributed Computing Environment,Computer science,Risk assessment,Data Protection Act 1998,Speedup,Scalability,Distributed computing,Computation
Conference
Volume
ISSN
Citations 
5262
0302-9743
3
PageRank 
References 
Authors
0.46
9
5
Name
Order
Citations
PageRank
Joan Guisado-Gámez1163.11
Arnau Prat-Pérez222713.44
Jordi Nin331126.53
Victor Muntés-Mulero420422.79
Josep-Ll. Larriba-Pey5162.38