| AI Awareness | Apr 25, 2025 | Safety Alignment | —Unverified | 0 | 0 |
| aiXamine: Simplified LLM Safety and Security | Apr 21, 2025 | 2kAdversarial Robustness | —Unverified | 0 | 0 |
| Align in Depth: Defending Jailbreak Attacks via Progressive Answer Detoxification | Mar 14, 2025 | Safety Alignment | —Unverified | 0 | 0 |
| Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models | Jun 2, 2025 | Safety Alignment | —Unverified | 0 | 0 |
| Alignment and Safety of Diffusion Models via Reinforcement Learning and Reward Modeling: A Survey | May 23, 2025 | Active LearningReinforcement Learning (RL) | —Unverified | 0 | 0 |
| Analysing Safety Risks in LLMs Fine-Tuned with Pseudo-Malicious Cyber Security Data | May 15, 2025 | Malware DetectionSafety Alignment | —Unverified | 0 | 0 |
| Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications | Feb 7, 2024 | Safety Alignment | —Unverified | 0 | 0 |
| Attention Eclipse: Manipulating Attention to Bypass LLM Safety-Alignment | Feb 21, 2025 | Safety Alignment | —Unverified | 0 | 0 |
| Backtracking for Safety | Mar 11, 2025 | Safety Alignment | —Unverified | 0 | 0 |
| Backtracking Improves Generation Safety | Sep 22, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |