Title
Efficient User Opt-Out from Block Stores
Abstract
Massive collection of user or device data is a growing trend for personalizing services. Cheap and scalable storage is key to allow advanced analysis on the long run. In this context, block (HDFS) and columnar (Dremel, Parquet) stores are increasingly leveraged. Data protection acts, or simply the will for transparency to the user, impose to opt-out data on demand. Unfortunately, those data stores have departed from traditional databases, and do not provide efficient access and deletion to specific bits of data. In this paper, we study how to cost-efficiently opt-out user data from these stores. We apply two intuitive strategies (systematic erasure and encryption) to the context of big data systems. We model their respective costs and show that in the context of a service running atop Amazon Web Services, there is no general winner strategy (except in the special case where data cannot be compressed). Application constraints, such as data arrival and user opt-out rates, should then be considered to select the most cost-efficient opt-out strategy, while practical means of actions are the sharding policy and the careful setting of block sizes.
Year
DOI
Venue
2016
10.1109/IC2EW.2016.7
2016 IEEE International Conference on Cloud Engineering Workshop (IC2EW)
Keywords
Field
DocType
Block/column stores,opt-out,data deletion,cost modeling,AWS
Data modeling,Transparency (graphic),Computer science,Encryption,Data Protection Act 1998,Big data,Database,Erasure,Scalability,Special case
Conference
ISBN
Citations 
PageRank 
978-1-5090-3685-1
0
0.34
References 
Authors
10
2
Name
Order
Citations
PageRank
Erwan Le Merrer132223.58
Nicolas Le Scouarnec220111.91