Abstract | ||
---|---|---|
Texts convey sophisticated knowledge. However, texts also convey sensitive information. Despite the success of general-purpose language models and domain-specific mechanisms with differential privacy (DP), existing text sanitization mechanisms still provide low utility, as cursed by the high-dimensional text representation. The companion issue of utilizing sanitized texts for downstream analytics is also under-explored. This paper takes a direct approach to text sanitization. Our insight is to consider both sensitivity and similarity via our new local DP notion. The sanitized texts also contribute to our sanitization-aware pretraining and fine-tuning, enabling privacy-preserving natural language processing over the BERT language model with promising utility. Surprisingly, the high utility does not boost up the success rate of inference attacks. |
Year | Venue | DocType |
---|---|---|
2021 | ACL/IJCNLP | Conference |
Volume | Citations | PageRank |
2021.findings-acl | 0 | 0.34 |
References | Authors | |
0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Xiang Yue | 1 | 3 | 4.78 |
Minxin Du | 2 | 0 | 0.34 |
Tianhao Wang | 3 | 69 | 10.79 |
Yaliang Li | 4 | 0 | 0.68 |
Huan Sun | 5 | 333 | 34.97 |
Sherman S. M. Chow | 6 | 1870 | 98.03 |