StereoSet: Measuring stereotypical bias in pretrained language models

2020-04-20ACL 2021Code Available1· sign in to hype

Moin Nadeem, Anna Bethke, Siva Reddy

Code Available — Be the first to reproduce this paper.

Code

github.com/moinnadeem/StereoSet
Officialpytorch★ 200
github.com/kanekomasahiro/evaluate_bias_in_mlm
pytorch★ 13
github.com/zalkikar/mlm-bias
pytorch★ 4

Abstract

A stereotype is an over-generalized belief about a particular group of people, e.g., Asians are good at math or Asians are bad drivers. Such beliefs (biases) are known to hurt target groups. Since pretrained language models are trained on large real world data, they are known to capture stereotypical biases. In order to assess the adverse effects of these models, it is important to quantify the bias captured in them. Existing literature on quantifying bias evaluates pretrained language models on a small set of artificially constructed bias-assessing sentences. We present StereoSet, a large-scale natural dataset in English to measure stereotypical biases in four domains: gender, profession, race, and religion. We evaluate popular models like BERT, GPT-2, RoBERTa, and XLNet on our dataset and show that these models exhibit strong stereotypical biases. We also present a leaderboard with a hidden test set to track the bias of future language models at https://stereoset.mit.edu

Tasks

Bias Detection Math

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
StereoSet	GPT-2 (small)	ICAT Score	72.97	—	Unverified
StereoSet	XLNet (large)	ICAT Score	72.03	—	Unverified
StereoSet	GPT-2 (medium)	ICAT Score	71.73	—	Unverified
StereoSet	BERT (base)	ICAT Score	71.21	—	Unverified
StereoSet	GPT-2 (large)	ICAT Score	70.54	—	Unverified
StereoSet	BERT (large)	ICAT Score	69.89	—	Unverified
StereoSet	RoBERTa (base)	ICAT Score	67.5	—	Unverified
StereoSet	XLNet (base)	ICAT Score	62.1	—	Unverified

StereoSet: Measuring stereotypical bias in pretrained language models

Code

Abstract

Tasks

Benchmark Results

Reproductions