| GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts | Sep 19, 2023 | Red Teaming | CodeCode Available | 2 |
| GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher | Aug 12, 2023 | EthicsRed Teaming | CodeCode Available | 2 |
| RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments | May 28, 2025 | BenchmarkingRed Teaming | CodeCode Available | 1 |
| MTSA: Multi-turn Safety Alignment for LLMs through Multi-round Red-teaming | May 22, 2025 | Red TeamingSafety Alignment | CodeCode Available | 1 |
| OET: Optimization-based prompt injection Evaluation Toolkit | May 1, 2025 | Adversarial RobustnessNatural Language Understanding | CodeCode Available | 1 |
| RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search | Apr 21, 2025 | DiversityEvolutionary Algorithms | CodeCode Available | 1 |
| sudo rm -rf agentic_security | Mar 26, 2025 | Adversarial AttackAI and Safety | CodeCode Available | 1 |
| Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training | Mar 24, 2025 | DiversityLarge Language Model | CodeCode Available | 1 |
| UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning | Feb 28, 2025 | Large Language ModelRed Teaming | CodeCode Available | 1 |
| Understanding and Enhancing the Transferability of Jailbreaking Attacks | Feb 5, 2025 | Intent RecognitionRed Teaming | CodeCode Available | 1 |