SOTAVerified

Red Teaming

Papers

Showing 3140 of 251 papers

TitleStatusHype
Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks SafetyCode0
Offensive Security for AI Systems: Concepts, Practices, and Applications0
AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents0
Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods0
DMRL: Data- and Model-aware Reward Learning for Data Extraction0
Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs0
Red Teaming Large Language Models for Healthcare0
OET: Optimization-based prompt injection Evaluation ToolkitCode1
When Testing AI Tests Us: Safeguarding Mental Health on the Digital Frontlines0
SAGE: A Generic Framework for LLM Safety EvaluationCode0
Show:102550
← PrevPage 4 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SUDOAttack Success Rate41Unverified