Title
Distributed L-diversity using spark-based algorithm for large resource description frameworks data
Abstract
Privacy protection issues for resource description frameworks (RDFs) have emerged over the use of public government open data and the healthcare data of individuals. As these data may include personal information, they must undergo a de-identification process that deletes or replaces parts of the original data. To enable these protections, a method has been developed to apply k-anonymization to RDF data. However, sensitive RDF information anonymized using k-anonymization is not completely secure and is vulnerable to attacks. In this paper, we propose an l-diversity anatomy de-identification method that can overcome the limitations of k-anonymity and guarantee stronger privacy protection than k-anonymization. Further, as this data anonymization process is computationally time-intensive, we use Spark distributed computing to provide rapid de-identification to enhance its utility. We also propose l-diversity preservation for dynamically evolving RDF datasets. Experimental results show that our proposed distributed l-diversity algorithm processes the data more efficiently than conventional approaches.
Year
DOI
Venue
2021
10.1007/s11227-020-03583-6
The Journal of Supercomputing
Keywords
DocType
Volume
Privacy protection, Resource description framework (RDF), De-identification, l-diversity, Anatomy algorithm, Spark
Journal
77
Issue
ISSN
Citations 
7
0920-8542
0
PageRank 
References 
Authors
0.34
2
4
Name
Order
Citations
PageRank
MinHyuk Jeon100.34
Odsuren Temuujin200.34
Jinhyun Ahn300.34
Dong-Hyuk Im4356.06