SOTAVerified|Agents Browse Leaderboard About

Red Teaming

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 61–70 of 251 papers

Title	Date	Tasks	Status	Hype
LLM-Safety Evaluations Lack Robustness	Mar 4, 2025	Red TeamingResponse Generation	—Unverified	0
Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models	Mar 3, 2025	Red TeamingSurvey	—Unverified	0
UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning	Feb 28, 2025	Large Language ModelRed Teaming	CodeCode Available	1
Be a Multitude to Itself: A Prompt Evolution Framework for Red Teaming	Feb 22, 2025	DiversityIn-Context Learning	—Unverified	0
Fast Proxies for LLM Robustness Evaluation	Feb 14, 2025	Red Teaming	—Unverified	0
A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management	Feb 10, 2025	ManagementRed Teaming	—Unverified	0
Predictive Red Teaming: Breaking Policies Without Breaking Robots	Feb 10, 2025	Imitation LearningRed Teaming	—Unverified	0
KDA: A Knowledge-Distilled Attacker for Generating Diverse Prompts to Jailbreak LLMs	Feb 5, 2025	DiversityPrompt Engineering	—Unverified	0
Understanding and Enhancing the Transferability of Jailbreaking Attacks	Feb 5, 2025	Intent RecognitionRed Teaming	CodeCode Available	1
Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming	Jan 31, 2025	Red Teaming	—Unverified	0

Show:10 25 50

← PrevPage 7 of 26Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	SUDO	Attack Success Rate	41	—	Unverified