Abstract | ||
---|---|---|
Search engine companies collect the "database of intentions", the histories
of their users' search queries. These search logs are a gold mine for
researchers. Search engine companies, however, are wary of publishing search
logs in order not to disclose sensitive information. In this paper we analyze
algorithms for publishing frequent keywords, queries and clicks of a search
log. We first show how methods that achieve variants of $k$-anonymity are
vulnerable to active attacks. We then demonstrate that the stronger guarantee
ensured by $\epsilon$-differential privacy unfortunately does not provide any
utility for this problem. We then propose an algorithm ZEALOUS and show how to
set its parameters to achieve $(\epsilon,\delta)$-probabilistic privacy. We
also contrast our analysis of ZEALOUS with an analysis by Korolova et al. [17]
that achieves $(\epsilon',\delta')$-indistinguishability. Our paper concludes
with a large experimental study using real applications where we compare
ZEALOUS and previous work that achieves $k$-anonymity in search log publishing.
Our results show that ZEALOUS yields comparable utility to $k-$anonymity while
at the same time achieving much stronger privacy guarantees. |
Year | Venue | Keywords |
---|---|---|
2009 | Clinical Orthopaedics and Related Research | information retrieval,search engine |
Field | DocType | Volume |
Data mining,Search engine,Information retrieval,Computer science,Publishing,Information sensitivity,Database | Journal | abs/0904.0 |
Citations | PageRank | References |
18 | 1.76 | 27 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Michaela Götz | 1 | 246 | 10.62 |
Ashwin Machanavajjhala | 2 | 2624 | 132.52 |
Guozhang Wang | 3 | 403 | 17.55 |
Xiaokui Xiao | 4 | 3266 | 142.32 |
Johannes Gehrke | 5 | 13362 | 1055.06 |