| CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations | Jul 8, 2025 | Generative Adversarial NetworkLarge Language Model | CodeCode Available | 0 |
| LLM Jailbreak Oracle | Jun 17, 2025 | LLM Jailbreak | —Unverified | 0 |
| SecurityLingua: Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression | Jun 15, 2025 | LLM JailbreakSafety Alignment | —Unverified | 0 |
| PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks | May 20, 2025 | LLM JailbreakSafety Alignment | CodeCode Available | 2 |
| Graph of Attacks with Pruning: Optimizing Stealthy Jailbreak Prompt Generation for Enhanced LLM Content Moderation | Jan 28, 2025 | LLM Jailbreak | CodeCode Available | 0 |
| CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models | Jan 2, 2025 | BenchmarkingComputer Security | CodeCode Available | 1 |
| DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak | Dec 23, 2024 | DenoisingDiversity | —Unverified | 0 |
| POEX: Understanding and Mitigating Policy Executable Jailbreak Attacks against Embodied AI | Dec 21, 2024 | LLM JailbreakRed Teaming | —Unverified | 0 |
| SATA: A Paradigm for LLM Jailbreak via Simple Assistive Task Linkage | Dec 19, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| SMILES-Prompting: A Novel Approach to LLM Jailbreak Attacks in Chemical Synthesis | Oct 21, 2024 | LLM JailbreakRed Teaming | CodeCode Available | 0 |