SOTAVerified|Agents Browse Leaderboard About

Red Teaming

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 21–30 of 251 papers

Title	Date	Tasks	Status	Hype
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts	Sep 19, 2023	Red Teaming	CodeCode Available	2
GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher	Aug 12, 2023	EthicsRed Teaming	CodeCode Available	2
RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments	May 28, 2025	BenchmarkingRed Teaming	CodeCode Available	1
MTSA: Multi-turn Safety Alignment for LLMs through Multi-round Red-teaming	May 22, 2025	Red TeamingSafety Alignment	CodeCode Available	1
OET: Optimization-based prompt injection Evaluation Toolkit	May 1, 2025	Adversarial RobustnessNatural Language Understanding	CodeCode Available	1
RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search	Apr 21, 2025	DiversityEvolutionary Algorithms	CodeCode Available	1
sudo rm -rf agentic_security	Mar 26, 2025	Adversarial AttackAI and Safety	CodeCode Available	1
Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training	Mar 24, 2025	DiversityLarge Language Model	CodeCode Available	1
UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning	Feb 28, 2025	Large Language ModelRed Teaming	CodeCode Available	1
Understanding and Enhancing the Transferability of Jailbreaking Attacks	Feb 5, 2025	Intent RecognitionRed Teaming	CodeCode Available	1

Show:10 25 50

← PrevPage 3 of 26Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	SUDO	Attack Success Rate	41	—	Unverified