SOTAVerified|Agents Browse Leaderboard About

Red Teaming

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–110 of 251 papers

Title	Date	Tasks	Status	Hype
Offensive Security for AI Systems: Concepts, Practices, and Applications	May 9, 2025	Red Teaming	—Unverified	0
AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents	May 9, 2025	NavigateRed Teaming	—Unverified	0
Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods	May 8, 2025	Red TeamingSystematic Literature Review	—Unverified	0
DMRL: Data- and Model-aware Reward Learning for Data Extraction	May 7, 2025	Prompt EngineeringRed Teaming	—Unverified	0
Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs	May 7, 2025	Red Teaming	—Unverified	0
Red Teaming Large Language Models for Healthcare	May 1, 2025	Language ModelingLanguage Modelling	—Unverified	0
When Testing AI Tests Us: Safeguarding Mental Health on the Digital Frontlines	Apr 29, 2025	Red Teaming	—Unverified	0
SAGE: A Generic Framework for LLM Safety Evaluation	Apr 28, 2025	Red TeamingSafety Alignment	CodeCode Available	0
Understanding and Mitigating Risks of Generative AI in Financial Services	Apr 25, 2025	FairnessRed Teaming	—Unverified	0
RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models	Apr 25, 2025	RAGRed Teaming	—Unverified	0

Show:10 25 50

← PrevPage 11 of 26Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	SUDO	Attack Success Rate	41	—	Unverified