| Unleashing the Unseen: Harnessing Benign Datasets for Jailbreaking Large Language Models | Oct 1, 2024 | Safety Alignment | CodeCode Available | 0 | 5 |
| One Model Transfer to All: On Robust Jailbreak Prompts Generation against LLMs | May 23, 2025 | AllSafety Alignment | CodeCode Available | 0 | 5 |
| One-Shot Safety Alignment for Large Language Models via Optimal Dualization | May 29, 2024 | Safety Alignment | CodeCode Available | 0 | 5 |
| Overriding Safety protections of Open-source Models | Sep 28, 2024 | Red TeamingSafety Alignment | CodeCode Available | 0 | 5 |
| BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage | Jun 3, 2025 | Prompt EngineeringRed Teaming | CodeCode Available | 0 | 5 |
| Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization | Oct 25, 2024 | Safety Alignment | CodeCode Available | 0 | 5 |
| Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors | Jun 12, 2025 | Question AnsweringSafety Alignment | CodeCode Available | 0 | 5 |
| Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models | Oct 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Beyond Safe Answers: A Benchmark for Evaluating True Risk Awareness in Large Reasoning Models | May 26, 2025 | Safety Alignment | CodeCode Available | 0 | 5 |
| Mitigating Fine-tuning Risks in LLMs via Safety-Aware Probing Optimization | May 22, 2025 | Safety Alignment | CodeCode Available | 0 | 5 |