| Unlocking Adversarial Suffix Optimization Without Affirmative Phrases: Efficient Black-box Jailbreaking via LLM as Optimizer | Aug 21, 2024 | Safety Alignment | CodeCode Available | 0 |
| Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding | Jun 17, 2024 | 16kLanguage Modelling | CodeCode Available | 0 |
| SATA: A Paradigm for LLM Jailbreak via Simple Assistive Task Linkage | Dec 19, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| SAGE: A Generic Framework for LLM Safety Evaluation | Apr 28, 2025 | Red TeamingSafety Alignment | CodeCode Available | 0 |
| SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings | Feb 18, 2025 | GPUSafety Alignment | CodeCode Available | 0 |
| The Better Angels of Machine Personality: How Personality Relates to LLM Safety | Jul 17, 2024 | FairnessSafety Alignment | CodeCode Available | 0 |
| TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data Synthesis | May 30, 2025 | DiversityLanguage Modeling | CodeCode Available | 0 |
| Can a large language model be a gaslighter? | Oct 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |