SOTAVerified

Red Teaming

Papers

Showing 1120 of 251 papers

TitleStatusHype
Improved Techniques for Optimization-Based Jailbreaking on Large Language ModelsCode2
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak PromptsCode2
Jailbreak Vision Language Models via Bi-Modal Adversarial PromptCode2
Tamper-Resistant Safeguards for Open-Weight LLMsCode2
Curiosity-driven Red-teaming for Large Language ModelsCode2
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMsCode2
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!Code2
ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red TeamingCode2
GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via CipherCode2
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks YetCode2
Show:102550
← PrevPage 2 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SUDOAttack Success Rate41Unverified