SOTAVerified

Red Teaming

Papers

Showing 151160 of 251 papers

TitleStatusHype
Effective Red-Teaming of Policy-Adherent Agents0
ELAB: Extensive LLM Alignment Benchmark in Persian Language0
Embodied Red Teaming for Auditing Robotic Foundation Models0
EVA: Red-Teaming GUI Agents via Evolving Indirect Prompt Injection0
Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity0
Exploring Straightforward Conversational Red-Teaming0
Exploring the Vulnerability of the Content Moderation Guardrail in Large Language Models via Intent Manipulation0
Fast Proxies for LLM Robustness Evaluation0
Finding Safety Neurons in Large Language Models0
FLIRT: Feedback Loop In-context Red Teaming0
Show:102550
← PrevPage 16 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SUDOAttack Success Rate41Unverified