| SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming | Aug 14, 2024 | Red TeamingSafety Alignment | —Unverified | 0 |
| EnJa: Ensemble Jailbreak on Large Language Models | Aug 7, 2024 | Safety Alignment | —Unverified | 0 |
| Can Large Language Models Automatically Jailbreak GPT-4V? | Jul 23, 2024 | Face RecognitionIn-Context Learning | —Unverified | 0 |
| Failures to Find Transferable Image Jailbreaks Between Vision-Language Models | Jul 21, 2024 | Instruction FollowingLanguage Modelling | —Unverified | 0 |
| The Better Angels of Machine Personality: How Personality Relates to LLM Safety | Jul 17, 2024 | FairnessSafety Alignment | CodeCode Available | 0 |
| Multilingual Blending: LLM Safety Alignment Evaluation with Language Mixture | Jul 10, 2024 | Safety Alignment | —Unverified | 0 |
| Jailbreak Attacks and Defenses Against Large Language Models: A Survey | Jul 5, 2024 | Code CompletionQuestion Answering | —Unverified | 0 |
| LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models | Jul 3, 2024 | Safety Alignment | —Unverified | 0 |
| SeqAR: Jailbreak LLMs with Sequential Auto-Generated Characters | Jul 2, 2024 | Red TeamingSafety Alignment | CodeCode Available | 0 |
| SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance | Jun 26, 2024 | Safety Alignment | CodeCode Available | 0 |