| One Model Transfer to All: On Robust Jailbreak Prompts Generation against LLMs | May 23, 2025 | AllSafety Alignment | CodeCode Available | 0 |
| Alignment and Safety of Diffusion Models via Reinforcement Learning and Reward Modeling: A Survey | May 23, 2025 | Active LearningReinforcement Learning (RL) | —Unverified | 0 |
| Shape it Up! Restoring LLM Safety during Finetuning | May 22, 2025 | Safety Alignment | —Unverified | 0 |
| CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning | May 22, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Mitigating Fine-tuning Risks in LLMs via Safety-Aware Probing Optimization | May 22, 2025 | Safety Alignment | CodeCode Available | 0 |
| From Evaluation to Defense: Advancing Safety in Video Large Language Models | May 22, 2025 | Safety Alignment | —Unverified | 0 |
| DuFFin: A Dual-Level Fingerprinting Framework for LLMs IP Protection | May 22, 2025 | QuantizationSafety Alignment | CodeCode Available | 0 |
| Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering | May 21, 2025 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment | May 20, 2025 | Safety Alignment | —Unverified | 0 |
| sudoLLM : On Multi-role Alignment of Language Models | May 20, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |