SOTAVerified

Bias Detection

Bias detection is the task of detecting and measuring racism, sexism and otherwise discriminatory behavior in a model (Source: https://stereoset.mit.edu/)

Papers

Showing 2650 of 199 papers

TitleStatusHype
A Review of the Challenges with Massive Web-mined Corpora Used in Large Language Models Pre-Training0
BENN: Bias Estimation Using Deep Neural Network0
Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models0
Beyond Explanation: A Case for Exploratory Text Visualizations of Non-Aggregated, Annotated Datasets0
BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs0
BEADs: Bias Evaluation Across Domains0
Any Large Language Model Can Be a Reliable Judge: Debiasing with a Reasoning-based Bias Detector0
Constructive Interpretability with CoLabel: Corroborative Integration, Complementary Features, and Collaborative Learning0
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema0
A Novel Method for News Article Event-Based Embedding0
A Keyword Based Approach to Understanding the Overpenalization of Marginalized Groups by English Marginal Abuse Models on Twitter0
Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs0
Annotating and Analyzing Biased Sentences in News Articles using Crowdsourcing0
Auditing Predictive Models for Intersectional Biases0
Accurate Uncertainty Estimation and Decomposition in Ensemble Learning0
Can we Debias Social Stereotypes in AI-Generated Images? Examining Text-to-Image Outputs and User Perceptions0
Auditing Algorithmic Fairness in Machine Learning for Health with Severity-Based LOGAN0
Can We Trust AI Agents? A Case Study of an LLM-Based Multi-Agent System for Ethical AI0
Cascading Adversarial Bias from Injection to Distillation in Language Models0
Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021): Workshop and Shared Task Report0
ChatGPT v.s. Media Bias: A Comparative Study of GPT-3.5 and Fine-tuned Language Models0
Classifier-to-Bias: Toward Unsupervised Automatic Bias Detection for Visual Classifiers0
Cognitive Bias Detection Using Advanced Prompt Engineering0
Detecting Cross-Geographic Biases in Toxicity Modeling on Social Media0
Auditing a Dutch Public Sector Risk Profiling Algorithm Using an Unsupervised Bias Detection Tool0
Show:102550
← PrevPage 2 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-2 (small)ICAT Score72.97Unverified
2XLNet (large)ICAT Score72.03Unverified
3GPT-2 (medium)ICAT Score71.73Unverified
4BERT (base)ICAT Score71.21Unverified
5GPT-2 (large)ICAT Score70.54Unverified
6BERT (large)ICAT Score69.89Unverified
7RoBERTa (base)ICAT Score67.5Unverified
8GAL 120BICAT Score65.6Unverified
9XLNet (base)ICAT Score62.1Unverified
10GPT-3 (text-davinci-002)ICAT Score60.8Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4Best-of0.5Unverified
2GemmaBest-of0.41Unverified
3BaselineBest-of0.41Unverified
4MistralBest-of0.36Unverified
5Llama2Best-of0.34Unverified
#ModelMetricClaimedVerifiedStatus
1BADICAT Score23.44Unverified
#ModelMetricClaimedVerifiedStatus
1RandomForest_default_hyperparametersAccuracy (%)49Unverified
#ModelMetricClaimedVerifiedStatus
1RoBERTa+ALBERTF170.4Unverified