SOTAVerified

Red Teaming

Papers

Showing 231240 of 251 papers

TitleStatusHype
Embodied Red Teaming for Auditing Robotic Foundation Models0
Finding Safety Neurons in Large Language Models0
ELAB: Extensive LLM Alignment Benchmark in Persian Language0
FLIRT: Feedback Loop In-context Red Teaming0
Games for AI Control: Models of Safety Evaluations of AI Deployment Protocols0
Effective Red-Teaming of Policy-Adherent Agents0
DMRL: Data- and Model-aware Reward Learning for Data Extraction0
Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning0
GhostPrompt: Jailbreaking Text-to-image Generative Models based on Dynamic Optimization0
Direct Unlearning Optimization for Robust and Safe Text-to-Image Models0
Show:102550
← PrevPage 24 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SUDOAttack Success Rate41Unverified