| Latent-space adversarial training with post-aware calibration for defending large language models against jailbreak attacks | Jan 18, 2025 | Safety Alignment | CodeCode Available | 0 |
| StructTransform: A Scalable Attack Surface for Safety-Aligned Large Language Models | Feb 17, 2025 | Safety Alignment | CodeCode Available | 0 |
| Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering | May 21, 2025 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models | Oct 31, 2024 | Red TeamingSafety Alignment | CodeCode Available | 0 |
| Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities | Oct 24, 2024 | Safety Alignment | CodeCode Available | 0 |
| How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities | Nov 15, 2023 | EthicsFairness | CodeCode Available | 0 |
| DiaBlo: Diagonal Blocks Are Sufficient For Finetuning | Jun 3, 2025 | Arithmetic ReasoningCode Generation | CodeCode Available | 0 |
| VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration | May 26, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| SafeWorld: Geo-Diverse Safety Alignment | Dec 9, 2024 | Safety Alignment | CodeCode Available | 0 |
| Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models | Oct 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |