SOTAVerified|Agents Browse Leaderboard About Blog

Red Teaming

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 251 papers

Title	Date	Tasks	Status	Hype
garak: A Framework for Security Probing Large Language Models	Jun 16, 2024	Red Teaming	CodeCode Available	9
PyRIT: A Framework for Security Risk Identification and Red Teaming in Generative AI System	Oct 1, 2024	Red Teaming	CodeCode Available	7
Seamless: Multilingual Expressive and Streaming Speech Translation	Dec 8, 2023	automatic-speech-translationMachine Translation	CodeCode Available	6
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal	Feb 6, 2024	Red Teaming	CodeCode Available	4
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs	Oct 3, 2024	Red Teaming	CodeCode Available	3
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases	Jul 17, 2024	Autonomous DrivingBackdoor Attack	CodeCode Available	3
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned	Aug 23, 2022	Language ModellingRed Teaming	CodeCode Available	3
Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation	Jan 29, 2025	Red TeamingSafety Alignment	CodeCode Available	2
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet	Aug 27, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Tamper-Resistant Safeguards for Open-Weight LLMs	Aug 1, 2024	Red TeamingTAR	CodeCode Available	2

Show:10 25 50

← PrevPage 1 of 26Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	SUDO	Attack Success Rate	41	—	Unverified