| Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models | Oct 31, 2024 | Red TeamingSafety Alignment | CodeCode Available | 0 | 5 |
| Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding | Jun 17, 2024 | 16kLanguage Modelling | CodeCode Available | 0 | 5 |
| A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement | Oct 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling | Feb 4, 2025 | Safety Alignment | CodeCode Available | 0 | 5 |
| One-Shot Safety Alignment for Large Language Models via Optimal Dualization | May 29, 2024 | Safety Alignment | CodeCode Available | 0 | 5 |
| AgentAlign: Navigating Safety Alignment in the Shift from Informative to Agentic Large Language Models | May 29, 2025 | Safety Alignment | CodeCode Available | 0 | 5 |
| One Model Transfer to All: On Robust Jailbreak Prompts Generation against LLMs | May 23, 2025 | AllSafety Alignment | CodeCode Available | 0 | 5 |
| Overriding Safety protections of Open-source Models | Sep 28, 2024 | Red TeamingSafety Alignment | CodeCode Available | 0 | 5 |
| Exploiting Prefix-Tree in Structured Output Interfaces for Enhancing Jailbreak Attacking | Feb 19, 2025 | Prompt EngineeringSafety Alignment | CodeCode Available | 0 | 5 |
| Can a large language model be a gaslighter? | Oct 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |