| VLM-Guard: Safeguarding Vision-Language Models via Fulfilling Safety Alignment Gap | Feb 14, 2025 | AttributeSafety Alignment | —Unverified | 0 |
| X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Compromising Usability | Feb 14, 2025 | Safety Alignment | CodeCode Available | 1 |
| The Hidden Dimensions of LLM Alignment: A Multi-Dimensional Safety Analysis | Feb 13, 2025 | Safety Alignment | CodeCode Available | 3 |
| QueryAttack: Jailbreaking Aligned Large Language Models Using Structured Non-natural Query Language | Feb 13, 2025 | Safety Alignment | CodeCode Available | 1 |
| Trustworthy AI: Safety, Bias, and Privacy -- A Survey | Feb 11, 2025 | Safety AlignmentSurvey | —Unverified | 0 |
| AI Alignment at Your Discretion | Feb 10, 2025 | Safety Alignment | —Unverified | 0 |
| Refining Positive and Toxic Samples for Dual Safety Self-Alignment of LLMs with Minimal Human Interventions | Feb 8, 2025 | Safety Alignment | —Unverified | 0 |
| Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions | Feb 6, 2025 | Safety Alignment | CodeCode Available | 1 |
| STAIR: Improving Safety Alignment with Introspective Reasoning | Feb 4, 2025 | Safety Alignment | CodeCode Available | 2 |
| Vulnerability Mitigation for Safety-Aligned Language Models via Debiasing | Feb 4, 2025 | Safety Alignment | —Unverified | 0 |