SOTAVerified|Agents Browse Leaderboard About

Red Teaming

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 31–40 of 251 papers

Title	Date	Tasks	Status	Hype
Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks Safety	May 11, 2025	Outlier DetectionRed Teaming	CodeCode Available	0
Offensive Security for AI Systems: Concepts, Practices, and Applications	May 9, 2025	Red Teaming	—Unverified	0
AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents	May 9, 2025	NavigateRed Teaming	—Unverified	0
Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods	May 8, 2025	Red TeamingSystematic Literature Review	—Unverified	0
DMRL: Data- and Model-aware Reward Learning for Data Extraction	May 7, 2025	Prompt EngineeringRed Teaming	—Unverified	0
Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs	May 7, 2025	Red Teaming	—Unverified	0
Red Teaming Large Language Models for Healthcare	May 1, 2025	Language ModelingLanguage Modelling	—Unverified	0
OET: Optimization-based prompt injection Evaluation Toolkit	May 1, 2025	Adversarial RobustnessNatural Language Understanding	CodeCode Available	1
When Testing AI Tests Us: Safeguarding Mental Health on the Digital Frontlines	Apr 29, 2025	Red Teaming	—Unverified	0
SAGE: A Generic Framework for LLM Safety Evaluation	Apr 28, 2025	Red TeamingSafety Alignment	CodeCode Available	0

Show:10 25 50

← PrevPage 4 of 26Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	SUDO	Attack Success Rate	41	—	Unverified