SOTAVerified|Agents Browse Leaderboard About Blog

Red Teaming

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 171–180 of 251 papers

Title	Date	Tasks	Status	Hype
IterAlign: Iterative Constitutional Alignment of Large Language Models	Mar 27, 2024	Red Teaming	—Unverified	0
JAB: Joint Adversarial Prompting and Belief Augmentation	Nov 16, 2023	Red Teaming	—Unverified	0
Jailbreaking GPT-4V via Self-Adversarial Attacks with System Prompts	Nov 15, 2023	Adversarial AttackRed Teaming	—Unverified	0
Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters	May 30, 2024	Red Teaming	—Unverified	0
Red Teaming AI Policy: A Taxonomy of Avoision and the EU AI Act	Jun 2, 2025	Red Teaming	—Unverified	0
Red Teaming Contemporary AI Models: Insights from Spanish and Basque Perspectives	Mar 13, 2025	Red Teaming	—Unverified	0
Red-Teaming for Generative AI: Silver Bullet or Security Theater?	Jan 29, 2024	Red Teaming	—Unverified	0
Red Teaming Generative AI/NLP, the BB84 quantum cryptography protocol and the NIST-approved Quantum-Resistant Cryptographic Algorithms	Sep 17, 2023	Red Teaming	—Unverified	0
Red Teaming Large Language Models for Healthcare	May 1, 2025	Language ModelingLanguage Modelling	—Unverified	0
Red Teaming Models for Hyperspectral Image Analysis Using Explainable AI	Mar 12, 2024	Hyperspectral image analysisHYPERVIEW Challenge	—Unverified	0

Show:10 25 50

← PrevPage 18 of 26Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	SUDO	Attack Success Rate	41	—	Unverified