SOTAVerified|Agents Browse Leaderboard About

Red Teaming

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 251 papers

Title	Date	Tasks	Status	Hype	Score
garak: A Framework for Security Probing Large Language Models	Jun 16, 2024	Red Teaming	CodeCode Available	9	5
PyRIT: A Framework for Security Risk Identification and Red Teaming in Generative AI System	Oct 1, 2024	Red Teaming	CodeCode Available	7	5
Seamless: Multilingual Expressive and Streaming Speech Translation	Dec 8, 2023	automatic-speech-translationMachine Translation	CodeCode Available	6	5
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal	Feb 6, 2024	Red Teaming	CodeCode Available	4	5
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases	Jul 17, 2024	Autonomous DrivingBackdoor Attack	CodeCode Available	3	5
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs	Oct 3, 2024	Red Teaming	CodeCode Available	3	5
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned	Aug 23, 2022	Language ModellingRed Teaming	CodeCode Available	3	5
Curiosity-driven Red-teaming for Large Language Models	Feb 29, 2024	Red TeamingReinforcement Learning (RL)	CodeCode Available	2	5
ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming	Apr 6, 2024	Adversarial RobustnessDialogue Safety Prediction	CodeCode Available	2	5
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!	Oct 5, 2023	Red TeamingSafety Alignment	CodeCode Available	2	5

Show:10 25 50

← PrevPage 1 of 26Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	SUDO	Attack Success Rate	41	—	Unverified