| Locking Down the Finetuned LLMs Safety | Oct 14, 2024 | Safety Alignment | CodeCode Available | 1 | 5 |
| QueryAttack: Jailbreaking Aligned Large Language Models Using Structured Non-natural Query Language | Feb 13, 2025 | Safety Alignment | CodeCode Available | 1 | 5 |
| SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types | Oct 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| LookAhead Tuning: Safer Language Models via Partial Answer Previews | Mar 24, 2025 | PositionSafety Alignment | CodeCode Available | 1 | 5 |
| Can Editing LLMs Inject Harm? | Jul 29, 2024 | FairnessGeneral Knowledge | CodeCode Available | 1 | 5 |
| DiaBlo: Diagonal Blocks Are Sufficient For Finetuning | Jun 3, 2025 | Arithmetic ReasoningCode Generation | CodeCode Available | 0 | 5 |
| Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models | Oct 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling | Feb 4, 2025 | Safety Alignment | CodeCode Available | 0 | 5 |
| Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models | Oct 31, 2024 | Red TeamingSafety Alignment | CodeCode Available | 0 | 5 |
| Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding | Jun 17, 2024 | 16kLanguage Modelling | CodeCode Available | 0 | 5 |