| Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model | May 10, 2025 | Safety Alignment | CodeCode Available | 0 |
| NeuRel-Attack: Neuron Relearning for Safety Disalignment in Large Language Models | Apr 29, 2025 | Safety Alignment | —Unverified | 0 |
| SAGE: A Generic Framework for LLM Safety Evaluation | Apr 28, 2025 | Red TeamingSafety Alignment | CodeCode Available | 0 |
| What's Pulling the Strings? Evaluating Integrity and Attribution in AI Training and Inference through Concept Shift | Apr 28, 2025 | AttributeData Poisoning | —Unverified | 0 |
| AI Awareness | Apr 25, 2025 | Safety Alignment | —Unverified | 0 |
| DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models | Apr 25, 2025 | DisentanglementSafety Alignment | CodeCode Available | 0 |
| aiXamine: Simplified LLM Safety and Security | Apr 21, 2025 | 2kAdversarial Robustness | —Unverified | 0 |
| Towards NSFW-Free Text-to-Image Generation via Safety-Constraint Direct Preference Optimization | Apr 19, 2025 | Contrastive LearningImage Generation | —Unverified | 0 |
| Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models | Apr 18, 2025 | Safety Alignment | —Unverified | 0 |
| VLMGuard-R1: Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization | Apr 17, 2025 | Multimodal ReasoningSafety Alignment | —Unverified | 0 |