SOTAVerified

Red Teaming

Papers

Showing 2130 of 251 papers

TitleStatusHype
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak PromptsCode2
GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via CipherCode2
RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS EnvironmentsCode1
MTSA: Multi-turn Safety Alignment for LLMs through Multi-round Red-teamingCode1
OET: Optimization-based prompt injection Evaluation ToolkitCode1
RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity SearchCode1
sudo rm -rf agentic_securityCode1
Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-TrainingCode1
UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own ReasoningCode1
Understanding and Enhancing the Transferability of Jailbreaking AttacksCode1
Show:102550
← PrevPage 3 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SUDOAttack Success Rate41Unverified