SOTAVerified

Red Teaming

Papers

Showing 1120 of 251 papers

TitleStatusHype
Improved Techniques for Optimization-Based Jailbreaking on Large Language ModelsCode2
Jailbreak Vision Language Models via Bi-Modal Adversarial PromptCode2
GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via CipherCode2
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially FastCode2
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!Code2
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMsCode2
Tamper-Resistant Safeguards for Open-Weight LLMsCode2
ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red TeamingCode2
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak PromptsCode2
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks YetCode2
Show:102550
← PrevPage 2 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SUDOAttack Success Rate41Unverified