Title
Privacy-aware dynamic feature selection
Abstract
Big data will enable the development of novel services that enhance a company's market advantage, competition, or productivity. At the same time, the utilization of such a service could disclose sensitive data in the process, which raises significant privacy concerns. To protect individuals, various policies, such as the Code of Fair Information Practices, as well as recent laws require organizations to capture only the minimal amount of data necessary to support a service. While this is a notable goal, choosing the minimal data is a non-trivial process, especially while considering privacy and utility constraints. In this paper, we introduce a technique to minimize sensitive data disclosure by focusing on privacy-aware feature selection. During model deployment, the service provider requests only a subset of the available features from the client, such that it can produce results with maximal confidence, while minimizing its ability to violate a client's privacy. We propose an iterative approach, where the server requests information one feature at a time until the client-specified privacy budget is exhausted. The overall process is dynamic, such that the feature selected at each step depends on the previously selected features and their corresponding values. We demonstrate our technique with three popular classification algorithms and perform an empirical analysis over three real world datasets to illustrate that, in almost all cases, classifiers that select features using our strategy have the same error-rate as state-of-the art static feature selection methods that fail to preserve privacy.
Year
DOI
Venue
2015
10.1109/ICDE.2015.7113274
Data Engineering
Keywords
Field
DocType
big data,data privacy,feature selection,pattern classification,classifier,empirical analysis,error-rate,iterative approach,privacy constraints,privacy-aware dynamic feature selection,sensitive data disclosure minimization,utility constraints,error rate,privacy,niobium,decision trees,servers,probability,measurement
Decision tree,Data mining,Feature selection,Computer science,Computer security,Server,FTC Fair Information Practice,Service provider,Information privacy,Big data,Database,Privacy software
Conference
ISSN
Citations 
PageRank 
1084-4627
3
0.38
References 
Authors
34
4
Name
Order
Citations
PageRank
Erman Pattuk1615.01
Murat Kantarcioglu22470168.03
Huseyin Ulusoy3635.43
Bradley Malin41302113.97