Bias Detection

Bias detection is the task of detecting and measuring racism, sexism and otherwise discriminatory behavior in a model (Source: https://stereoset.mit.edu/)

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 176–199 of 199 papers

Title	Date	Tasks	Status
Decoding News Bias: Multi Bias Detection in News Articles	Jan 5, 2025	ArticlesBias Detection	—Unverified
Decoding News Narratives: A Critical Analysis of Large Language Models in Framing Detection	Feb 18, 2024	Bias Detection	—Unverified
Deep Learning for Bias Detection: From Inception to Deployment	Oct 12, 2021	Bias DetectionDeep Learning	—Unverified
Quantifying Risk Propensities of Large Language Models: Ethical Focus and Bias Detection through Role-Play	Oct 26, 2024	Bias DetectionDecision Making	—Unverified
Designing Tools for Semi-Automated Detection of Machine Learning Biases: An Interview Study	Mar 13, 2020	Bias DetectionBIG-bench Machine Learning	—Unverified
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema	Apr 16, 2021	Artifact DetectionBias Detection	—Unverified
Detecting Gender Bias in Transformer-based Models: A Case Study on BERT	Oct 15, 2021	Bias DetectionGender Bias Detection	—Unverified
Detecting Linguistic Bias in Government Documents Using Large language Models	Feb 19, 2025	Bias Detection	—Unverified
The Impact of Unstated Norms in Bias Analysis of Language Models	Apr 4, 2024	Bias Detectioncounterfactual	—Unverified
Detecting Political Bias in News Articles Using Headline Attention	Aug 1, 2019	ArticlesBias Detection	—Unverified
Robots Enact Malignant Stereotypes	Jul 23, 2022	Bias DetectionGender Bias Detection	—Unverified
Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks	Feb 16, 2022	Bias DetectionOpen-Domain Dialog	—Unverified
Unsupervised Bias Detection in College Student Newspapers	Sep 11, 2023	Bias DetectionLanguage Modeling	—Unverified
Sample Complexity of Bias Detection with Subsampled Point-to-Subspace Distances	Feb 4, 2025	Bias Detection	—Unverified
DocNet: Semantic Structure in Inductive Bias Detection Models	Jun 16, 2024	ArticlesBias Detection	—Unverified
Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation	Dec 4, 2024	Bias DetectionDisentanglement	—Unverified
Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms	Jul 4, 2024	Bias DetectionTask 2	—Unverified
Efficient Fairness Testing in Large Language Models: Prioritizing Metamorphic Relations for Bias Detection	May 9, 2025	Bias DetectionDiversity	—Unverified
Efficient Gender Debiasing of Pre-trained Indic Language Models	Sep 8, 2022	Bias DetectionCultural Vocal Bursts Intensity Prediction	—Unverified
Enhancing Bias Detection in Political News Using Pragmatic Presupposition	Jul 1, 2020	ArticlesBias Detection	—Unverified
Mitigating the Risk of Health Inequity Exacerbated by Large Language Models	Oct 7, 2024	Bias DetectionMedical Question Answering	—Unverified
Auditing Predictive Models for Intersectional Biases	Jun 22, 2023	Bias DetectionFairness	—Unverified
Epistemological Bias As a Means for the Automated Detection of Injustices in Text	Jul 8, 2024	Bias Detection	—Unverified
Evaluating AI fairness in credit scoring with the BRIO tool	Jun 5, 2024	Bias DetectionFairness	—Unverified

Show:10 25 50

← PrevPage 8 of 8Next →

All datasets StereoSet rt-inod-bias ICAT LLM bias PlantVillage_8px Wiki Neutrality Corpus

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-2 (small)	ICAT Score	72.97	—	Unverified
2	XLNet (large)	ICAT Score	72.03	—	Unverified
3	GPT-2 (medium)	ICAT Score	71.73	—	Unverified
4	BERT (base)	ICAT Score	71.21	—	Unverified
5	GPT-2 (large)	ICAT Score	70.54	—	Unverified
6	BERT (large)	ICAT Score	69.89	—	Unverified
7	RoBERTa (base)	ICAT Score	67.5	—	Unverified
8	GAL 120B	ICAT Score	65.6	—	Unverified
9	XLNet (base)	ICAT Score	62.1	—	Unverified
10	GPT-3 (text-davinci-002)	ICAT Score	60.8	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GPT-4	Best-of	0.5	—	Unverified
2	Baseline	Best-of	0.41	—	Unverified
3	Gemma	Best-of	0.41	—	Unverified
4	Mistral	Best-of	0.36	—	Unverified
5	Llama2	Best-of	0.34	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	BAD	ICAT Score	23.44	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	RandomForest_default_hyperparameters	Accuracy (%)	49	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	RoBERTa+ALBERT	F1	70.4	—	Unverified