HONEST: Measuring Hurtful Sentence Completion in Language Models
Debora Nozza, Federico Bianchi, Dirk Hovy
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/milanlproc/honestIn papernone★ 21
Abstract
Language models have revolutionized the field of NLP. However, language models capture and proliferate hurtful stereotypes, especially in text generation. Our results show that 4.3\% of the time, language models complete a sentence with a hurtful word. These cases are not random, but follow language and gender-specific patterns. We propose a score to measure hurtful sentence completions in language models (HONEST). It uses a systematic template- and lexicon-based bias evaluation methodology for six languages. Our findings suggest that these models replicate and amplify deep-seated societal stereotypes about gender roles. Sentence completions refer to sexual promiscuity when the target is female in 9\% of the time, and in 4\% to homosexuality when the target is male. The results raise questions about the use of these models in production settings.