Distributed L-diversity using spark-based algorithm for large resource description frameworks data - Citegraph

Paper Info

Title
Distributed L-diversity using spark-based algorithm for large resource description frameworks data

Abstract
Privacy protection issues for resource description frameworks (RDFs) have emerged over the use of public government open data and the healthcare data of individuals. As these data may include personal information, they must undergo a de-identification process that deletes or replaces parts of the original data. To enable these protections, a method has been developed to apply k-anonymization to RDF data. However, sensitive RDF information anonymized using k-anonymization is not completely secure and is vulnerable to attacks. In this paper, we propose an l-diversity anatomy de-identification method that can overcome the limitations of k-anonymity and guarantee stronger privacy protection than k-anonymization. Further, as this data anonymization process is computationally time-intensive, we use Spark distributed computing to provide rapid de-identification to enhance its utility. We also propose l-diversity preservation for dynamically evolving RDF datasets. Experimental results show that our proposed distributed l-diversity algorithm processes the data more efficiently than conventional approaches.

Year	DOI	Venue
2021	10.1007/s11227-020-03583-6	The Journal of Supercomputing
Keywords	DocType	Volume
Privacy protection, Resource description framework (RDF), De-identification, l-diversity, Anatomy algorithm, Spark	Journal	77
Issue	ISSN	Citations
7	0920-8542	0
PageRank	References	Authors
0.34	2	4

Authors (4 rows)

Cited by (0 rows)

References (2 rows)

Name	Order	Citations	PageRank
MinHyuk Jeon	1	0	0.34
Odsuren Temuujin	2	0	0.34
Jinhyun Ahn	3	0	0.34
Dong-Hyuk Im	4	35	6.06

1