SOTAVerified

Red Teaming

Papers

Showing 131140 of 251 papers

TitleStatusHype
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)Code1
Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle0
Direct Unlearning Optimization for Robust and Safe Text-to-Image Models0
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge BasesCode3
Reliable and Efficient Concept Erasure of Text-to-Image Diffusion ModelsCode2
ASTPrompter: Weakly Supervised Automated Language Model Red-Teaming to Identify Low-Perplexity Toxic PromptsCode0
The Human Factor in AI Red Teaming: Perspectives from Social and Collaborative Computing0
Automated Progressive Red TeamingCode0
SeqAR: Jailbreak LLMs with Sequential Auto-Generated CharactersCode0
Purple-teaming LLMs with Adversarial Defender Training0
Show:102550
← PrevPage 14 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SUDOAttack Success Rate41Unverified