SOTAVerified

Red Teaming

Papers

Showing 171180 of 251 papers

TitleStatusHype
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMsCode2
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge0
ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red TeamingCode2
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?Code0
Red-Teaming Segment Anything ModelCode0
Against The Achilles' Heel: A Survey on Red Teaming for Generative ModelsCode2
Aurora-M: Open Source Continual Pre-training for Multilingual Language and Code0
IterAlign: Iterative Constitutional Alignment of Large Language Models0
HRLAIF: Improvements in Helpfulness and Harmlessness in Open-domain Reinforcement Learning From AI Feedback0
Distract Large Language Models for Automatic Jailbreak AttackCode0
Show:102550
← PrevPage 18 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SUDOAttack Success Rate41Unverified