Bias Detection

Bias detection is the task of detecting and measuring racism, sexism and otherwise discriminatory behavior in a model (Source: https://stereoset.mit.edu/)

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 199 papers

Title	Date	Tasks	Status	Hype
How Neural Networks Organize Concepts: Introducing Concept Trajectory Analysis for Deep Learning Interpretability	Jun 1, 2025	Bias Detection	CodeCode Available	0
Cascading Adversarial Bias from Injection to Distillation in Language Models	May 30, 2025	Bias DetectionCode Generation	—Unverified	0
Can we Debias Social Stereotypes in AI-Generated Images? Examining Text-to-Image Outputs and User Perceptions	May 27, 2025	Bias Detection	—Unverified	0
Any Large Language Model Can Be a Reliable Judge: Debiasing with a Reasoning-based Bias Detector	May 21, 2025	Bias DetectionIn-Context Learning	—Unverified	0
BiasLab: Toward Explainable Political Bias Detection with Dual-Axis Annotations and Rationale Indicators	May 21, 2025	ArticlesBias Detection	—Unverified	0
To Bias or Not to Bias: Detecting bias in News with bias-detector	May 19, 2025	Bias DetectionSentence	CodeCode Available	0
Can Global XAI Methods Reveal Injected Bias in LLMs? SHAP vs Rule Extraction vs RuleSHAP	May 16, 2025	Bias DetectionMisinformation	CodeCode Available	0
Efficient Fairness Testing in Large Language Models: Prioritizing Metamorphic Relations for Bias Detection	May 9, 2025	Bias DetectionDiversity	—Unverified	0
Explainable AI in Spatial Analysis	May 1, 2025	Bias DetectionExplainable artificial intelligence	CodeCode Available	2
BiasGuard: A Reasoning-enhanced Bias Detection Tool For Large Language Models	Apr 30, 2025	Bias DetectionDecision Making	—Unverified	0
Toward Holistic Evaluation of Recommender Systems Powered by Generative Models	Apr 9, 2025	Bias DetectionRecommendation Systems	—Unverified	0
Neutralizing the Narrative: AI-Powered Debiasing of Online News Articles	Apr 4, 2025	ArticlesBias Detection	—Unverified	0
STOOD-X methodology: using statistical nonparametric test for OOD Detection Large-Scale datasets enhanced with explainability	Apr 3, 2025	Bias DetectionOut of Distribution (OOD) Detection	—Unverified	0
On the Mutual Influence of Gender and Occupation in LLM Representations	Mar 9, 2025	Bias DetectionOccupation prediction	—Unverified	0
Fine-Grained Bias Detection in LLM: Enhancing detection mechanisms for nuanced biases	Mar 8, 2025	Bias Detectioncounterfactual	—Unverified	0
Cognitive Bias Detection Using Advanced Prompt Engineering	Mar 7, 2025	Bias DetectionDecision Making	—Unverified	0
Visual Reasoning Evaluation of Grok, Deepseek Janus, Gemini, Qwen, Mistral, and ChatGPT	Feb 23, 2025	Bias DetectionVisual Reasoning	—Unverified	0
Robust Bias Detection in MLMs and its Application to Human Trait Ratings	Feb 21, 2025	Bias Detection	CodeCode Available	0
Detecting Linguistic Bias in Government Documents Using Large language Models	Feb 19, 2025	Bias Detection	—Unverified	0
Towards Equitable AI: Detecting Bias in Using Large Language Models for Marketing	Feb 18, 2025	Bias DetectionMarketing	—Unverified	0
BiaSWE: An Expert Annotated Dataset for Misogyny Detection in Swedish	Feb 11, 2025	Bias DetectionSpecificity	—Unverified	0
FairT2I: Mitigating Social Bias in Text-to-Image Generation via Large Language Model-Assisted Detection and Attribute Rebalancing	Feb 6, 2025	AttributeBias Detection	—Unverified	0
LLMs can be easily Confused by Instructional Distractions	Feb 5, 2025	Bias DetectionCode Generation	—Unverified	0
Sample Complexity of Bias Detection with Subsampled Point-to-Subspace Distances	Feb 4, 2025	Bias Detection	—Unverified	0
Bias Detection via Maximum Subgroup Discrepancy	Feb 4, 2025	Bias Detection	—Unverified	0

Show:10 25 50

← PrevPage 1 of 8Next →

All datasets StereoSet rt-inod-bias ICAT LLM bias PlantVillage_8px Wiki Neutrality Corpus

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-2 (small)	ICAT Score	72.97	—	Unverified
2	XLNet (large)	ICAT Score	72.03	—	Unverified
3	GPT-2 (medium)	ICAT Score	71.73	—	Unverified
4	BERT (base)	ICAT Score	71.21	—	Unverified
5	GPT-2 (large)	ICAT Score	70.54	—	Unverified
6	BERT (large)	ICAT Score	69.89	—	Unverified
7	RoBERTa (base)	ICAT Score	67.5	—	Unverified
8	GAL 120B	ICAT Score	65.6	—	Unverified
9	XLNet (base)	ICAT Score	62.1	—	Unverified
10	GPT-3 (text-davinci-002)	ICAT Score	60.8	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GPT-4	Best-of	0.5	—	Unverified
2	Baseline	Best-of	0.41	—	Unverified
3	Gemma	Best-of	0.41	—	Unverified
4	Mistral	Best-of	0.36	—	Unverified
5	Llama2	Best-of	0.34	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	BAD	ICAT Score	23.44	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	RandomForest_default_hyperparameters	Accuracy (%)	49	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	RoBERTa+ALBERT	F1	70.4	—	Unverified