| JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models | Jun 26, 2024 | LLM JailbreakSurvey | CodeCode Available | 2 | 5 |
| Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues | Oct 14, 2024 | LLM JailbreakSafety Alignment | CodeCode Available | 2 | 5 |
| JailBreakV: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks | Apr 3, 2024 | LLM Jailbreak | CodeCode Available | 2 | 5 |
| PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks | May 20, 2025 | LLM JailbreakSafety Alignment | CodeCode Available | 2 | 5 |
| Cognitive Overload Attack:Prompt Injection for Long Context | Oct 15, 2024 | In-Context LearningLLM Jailbreak | CodeCode Available | 1 | 5 |
| Automatic Prompt Optimization with "Gradient Descent" and Beam Search | May 4, 2023 | LLM Jailbreak | CodeCode Available | 1 | 5 |
| CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models | Jan 2, 2025 | BenchmarkingComputer Security | CodeCode Available | 1 | 5 |
| CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations | Jul 8, 2025 | Generative Adversarial NetworkLarge Language Model | CodeCode Available | 0 | 5 |
| Graph of Attacks with Pruning: Optimizing Stealthy Jailbreak Prompt Generation for Enhanced LLM Content Moderation | Jan 28, 2025 | LLM Jailbreak | CodeCode Available | 0 | 5 |
| Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization | May 15, 2024 | LLM Jailbreak | CodeCode Available | 0 | 5 |