NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human

2024-06-06Unverified0· sign in to hype

Shuo Huang, William MacLean, Xiaoxi Kang, Anqi Wu, Lizhen Qu, Qiongkai Xu, Zhuang Li, Xingliang Yuan, Gholamreza Haffari

arXiv PDF

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Increasing concerns about privacy leakage issues in academia and industry arise when employing NLP models from third-party providers to process sensitive texts. To protect privacy before sending sensitive data to those models, we suggest sanitizing sensitive text using two common strategies used by humans: i) deleting sensitive expressions, and ii) obscuring sensitive details by abstracting them. To explore the issues and develop a tool for text rewriting, we curate the first corpus, coined NAP^2, through both crowdsourcing and the use of large language models (LLMs). Compared to the prior works based on differential privacy, which lead to a sharp drop in information utility and unnatural texts, the human-inspired approaches result in more natural rewrites and offer an improved balance between privacy protection and data utility, as demonstrated by our extensive experiments.

Tasks

Privacy Preserving

NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human

Abstract

Tasks

Reproductions