| Cognitive Overload Attack:Prompt Injection for Long Context | Oct 15, 2024 | In-Context LearningLLM Jailbreak | CodeCode Available | 1 |
| Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues | Oct 14, 2024 | LLM JailbreakSafety Alignment | CodeCode Available | 2 |
| Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak Attacks | Oct 5, 2024 | LLM Jailbreak | —Unverified | 0 |
| HSF: Defending against Jailbreak Attacks with Hidden State Filtering | Aug 31, 2024 | LLM Jailbreak | —Unverified | 0 |
| Hide Your Malicious Goal Into Benign Narratives: Jailbreak Large Language Models through Carrier Articles | Aug 20, 2024 | ArticlesLanguage Modeling | —Unverified | 0 |
| JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models | Jun 26, 2024 | LLM JailbreakSurvey | CodeCode Available | 2 |
| SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner | Jun 8, 2024 | Adversarial AttackLLM Jailbreak | —Unverified | 0 |
| Efficient Indirect LLM Jailbreak via Multimodal-LLM Jailbreak | May 30, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| WordGame: Efficient & Effective LLM Jailbreak via Simultaneous Obfuscation in Query and Response | May 22, 2024 | LLM JailbreakSafety Alignment | —Unverified | 0 |
| Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization | May 15, 2024 | LLM Jailbreak | CodeCode Available | 0 |