Sentence-level Privacy for Document Embeddings

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Abstract

User language data can contain highly sensitive personal content. As such, it is imperative to offer users a strong and interpretable privacy guarantee when learning from their data. In this work we propose SentDP, pure local differential privacy at the sentence level for a single user document. We propose a novel technique, DeepCandidate, that combines concepts from robust statistics and language modeling to produce high (768) dimensional, general -SentDP document embeddings. This guarantees that any single sentence in a document can be substituted with any other sentence while keeping the embedding -indistinguishable. Our experiments indicate that these private document embeddings are useful for downstream tasks like sentiment analysis and topic classification and even outperform baseline methods with weaker guarantees like word-level Metric DP.

Tasks

Language Modeling Language Modelling Sentence Sentiment Analysis Topic Classification

Sentence-level Privacy for Document Embeddings

Abstract

Tasks

Reproductions