| garak: A Framework for Security Probing Large Language Models | Jun 16, 2024 | Red Teaming | CodeCode Available | 9 | 5 |
| PyRIT: A Framework for Security Risk Identification and Red Teaming in Generative AI System | Oct 1, 2024 | Red Teaming | CodeCode Available | 7 | 5 |
| Seamless: Multilingual Expressive and Streaming Speech Translation | Dec 8, 2023 | automatic-speech-translationMachine Translation | CodeCode Available | 6 | 5 |
| HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal | Feb 6, 2024 | Red Teaming | CodeCode Available | 4 | 5 |
| AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases | Jul 17, 2024 | Autonomous DrivingBackdoor Attack | CodeCode Available | 3 | 5 |
| AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs | Oct 3, 2024 | Red Teaming | CodeCode Available | 3 | 5 |
| Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned | Aug 23, 2022 | Language ModellingRed Teaming | CodeCode Available | 3 | 5 |
| Curiosity-driven Red-teaming for Large Language Models | Feb 29, 2024 | Red TeamingReinforcement Learning (RL) | CodeCode Available | 2 | 5 |
| ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming | Apr 6, 2024 | Adversarial RobustnessDialogue Safety Prediction | CodeCode Available | 2 | 5 |
| Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To! | Oct 5, 2023 | Red TeamingSafety Alignment | CodeCode Available | 2 | 5 |