SOTAVerified

Red Teaming

Papers

Showing 1120 of 251 papers

TitleStatusHype
Reliable and Efficient Concept Erasure of Text-to-Image Diffusion ModelsCode2
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language ModelsCode2
Jailbreak Vision Language Models via Bi-Modal Adversarial PromptCode2
Improved Techniques for Optimization-Based Jailbreaking on Large Language ModelsCode2
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMsCode2
ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red TeamingCode2
Against The Achilles' Heel: A Survey on Red Teaming for Generative ModelsCode2
Curiosity-driven Red-teaming for Large Language ModelsCode2
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially FastCode2
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!Code2
Show:102550
← PrevPage 2 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SUDOAttack Success Rate41Unverified