SOTAVerified

Red Teaming

Papers

Showing 111120 of 251 papers

TitleStatusHype
A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI0
Fast Proxies for LLM Robustness Evaluation0
JAB: Joint Adversarial Prompting and Belief Augmentation0
Exploring the Vulnerability of the Content Moderation Guardrail in Large Language Models via Intent Manipulation0
Be a Multitude to Itself: A Prompt Evolution Framework for Red Teaming0
Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters0
A Framework for Evaluating Emerging Cyberattack Capabilities of AI0
Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency0
Exploring Straightforward Conversational Red-Teaming0
Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity0
Show:102550
← PrevPage 12 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SUDOAttack Success Rate41Unverified