SOTAVerified

Red Teaming

Papers

Showing 6170 of 251 papers

TitleStatusHype
Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic PromptsCode1
Explore, Establish, Exploit: Red Teaming Language Models from ScratchCode1
Gandalf the Red: Adaptive Security for LLMsCode1
Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn PlannerCode1
DiveR-CT: Diversity-enhanced Red Teaming Large Language Model Assistants with Relaxing ConstraintsCode1
RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn JailbreakingCode1
Red Teaming Language Model Detectors with Language ModelsCode1
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring TechniqueCode1
Defending Against Unforeseen Failure Modes with Latent Adversarial TrainingCode1
SEAS: Self-Evolving Adversarial Safety Optimization for Large Language ModelsCode1
Show:102550
← PrevPage 7 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SUDOAttack Success Rate41Unverified