SOTAVerified

De-Identification of Emails: Pseudonymizing Privacy-Sensitive Data in a German Email Corpus

2019-09-01RANLP 2019Unverified0· sign in to hype

Elisabeth Eder, Ulrike Krieg-Holz, Udo Hahn

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We deal with the pseudonymization of those stretches of text in emails that might allow to identify real individual persons. This task is decomposed into two steps. First, named entities carrying privacy-sensitive information (e.g., names of persons, locations, phone numbers or dates) are identified, and, second, these privacy-bearing entities are replaced by synthetically generated surrogates (e.g., a person originally named `John Doe' is renamed as `Bill Powers'). We describe a system architecture for surrogate generation and evaluate our approach on CodeAlltag, a German email corpus.

Tasks

Reproductions