HB Deid - HB De-identification tool demonstrator
2021-05-01NoDaLiDa 2021Unverified0· sign in to hype
Hanna Berg, Hercules Dalianis
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
This paper describes a freely available web-based demonstrator called HB Deid. HB Deid identifies so-called protected health information, PHI, in a text written in Swedish and removes, masks, or replaces them with surrogates or pseudonyms. PHIs are named entities such as personal names, locations, ages, phone numbers, dates. HB Deid uses a CRF model trained on non-sensitive annotated text in Swedish, as well as a rule-based post-processing step for finding PHI. The final step in obscuring the PHI is then to either mask it, show only the class name or use a rule-based pseudonymisation system to replace it.