| SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation | Dec 13, 2024 | Image GenerationSafety Alignment | —Unverified | 0 |
| No Free Lunch for Defending Against Prefilling Attack by In-Context Learning | Dec 13, 2024 | In-Context LearningSafety Alignment | —Unverified | 0 |
| Model-Editing-Based Jailbreak against Safety-aligned Large Language Models | Dec 11, 2024 | Model EditingSafety Alignment | —Unverified | 0 |
| Na'vi or Knave: Jailbreaking Language Models via Metaphorical Avatars | Dec 10, 2024 | Safety Alignment | —Unverified | 0 |
| SafeWorld: Geo-Diverse Safety Alignment | Dec 9, 2024 | Safety Alignment | CodeCode Available | 0 |
| Safety Alignment Backfires: Preventing the Re-emergence of Suppressed Concepts in Fine-tuned Text-to-Image Diffusion Models | Nov 30, 2024 | Safety Alignment | —Unverified | 0 |
| PEFT-as-an-Attack! Jailbreaking Language Models during Federated Parameter-Efficient Fine-Tuning | Nov 28, 2024 | Federated Learningparameter-efficient fine-tuning | —Unverified | 0 |
| Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models | Nov 27, 2024 | Image GenerationSafety Alignment | —Unverified | 0 |
| Don't Command, Cultivate: An Exploratory Study of System-2 Alignment | Nov 26, 2024 | Prompt EngineeringSafety Alignment | CodeCode Available | 0 |
| Ensuring Safety and Trust: Analyzing the Risks of Large Language Models in Medicine | Nov 20, 2024 | FairnessSafety Alignment | —Unverified | 0 |