SOTAVerified

Red Teaming

Papers

Showing 110 of 251 papers

TitleStatusHype
garak: A Framework for Security Probing Large Language ModelsCode9
PyRIT: A Framework for Security Risk Identification and Red Teaming in Generative AI SystemCode7
Seamless: Multilingual Expressive and Streaming Speech TranslationCode6
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust RefusalCode4
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMsCode3
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge BasesCode3
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons LearnedCode3
Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail ModerationCode2
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks YetCode2
Tamper-Resistant Safeguards for Open-Weight LLMsCode2
Show:102550
← PrevPage 1 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SUDOAttack Success Rate41Unverified