Title
Differentially Private Clustering in High-Dimensional Euclidean Spaces.
Abstract
We study the problem of clustering sensitive data while preserving the privacy of individuals represented in the dataset, which has broad applications in practical machine learning and data analysis tasks. Although the problem has been widely studied in the context of low-dimensional, discrete spaces, much remains unknown concerning private clustering in high-dimensional Euclidean spaces $\mathbb{R}^d$. In this work, we give differentially private and efficient algorithms achieving strong guarantees for $k$-means and $k$-median clustering when $d=\Omega(\mathsf{polylog}(n))$. Our algorithm achieves clustering loss at most $\log^3(n)\mathsf{OPT}+\mathsf{poly}(\log n,d,k)$, advancing the state-of-the-art result of $\sqrt{d}\mathsf{OPT}+\mathsf{poly}(\log n,d^d,k^d)$. We also study the case where the data points are $s$-sparse and show that the clustering loss can scale logarithmically with $d$, i.e., $\log^3(n)\mathsf{OPT}+\mathsf{poly}(\log n,\log d,k,s)$. Experiments on both synthetic and real datasets verify the effectiveness of the proposed method.
Year
Venue
Field
2017
ICML
Pattern recognition,Computer science,Artificial intelligence,Euclidean geometry,Cluster analysis
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
12
5
Name
Order
Citations
PageRank
Maria-Florina Balcan11445105.01
Travis Dick2235.94
Yingyu Liang339331.39
Wenlong Mou4123.97
Hongyang Zhang51108.33