| Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations | Oct 9, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs | Oct 3, 2024 | Red Teaming | CodeCode Available | 3 |
| SteerDiff: Steering towards Safe Text-to-Image Diffusion Models | Oct 3, 2024 | Image GenerationRed Teaming | —Unverified | 0 |
| Automated Red Teaming with GOAT: the Generative Offensive Agent Tester | Oct 2, 2024 | Red Teaming | —Unverified | 0 |
| PyRIT: A Framework for Security Risk Identification and Red Teaming in Generative AI System | Oct 1, 2024 | Red Teaming | CodeCode Available | 7 |
| Overriding Safety protections of Open-source Models | Sep 28, 2024 | Red TeamingSafety Alignment | CodeCode Available | 0 |
| RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking | Sep 26, 2024 | Red Teaming | CodeCode Available | 1 |
| Holistic Automated Red Teaming for Large Language Models through Top-Down Test Case Generation and Multi-turn Interaction | Sep 25, 2024 | DiversityRed Teaming | CodeCode Available | 1 |
| Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI | Sep 23, 2024 | Red Teaming | —Unverified | 0 |
| Jailbreaking Large Language Models with Symbolic Mathematics | Sep 17, 2024 | Red Teaming | —Unverified | 0 |