| Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming | Jun 17, 2024 | DiversityRed Teaming | —Unverified | 0 |
| "Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jailbreak | Jun 17, 2024 | Red Teaming | CodeCode Available | 1 |
| STAR: SocioTechnical Approach to Red Teaming Language Models | Jun 17, 2024 | Red Teaming | —Unverified | 0 |
| CELL your Model: Contrastive Explanations for Large Language Models | Jun 17, 2024 | Red TeamingText Generation | —Unverified | 0 |
| garak: A Framework for Security Probing Large Language Models | Jun 16, 2024 | Red Teaming | CodeCode Available | 9 |
| MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models | Jun 11, 2024 | Red Teaming | CodeCode Available | 1 |
| Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt | Jun 6, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 |
| Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits | Jun 3, 2024 | Red Teaming | CodeCode Available | 1 |
| Improved Techniques for Optimization-Based Jailbreaking on Large Language Models | May 31, 2024 | Red Teaming | CodeCode Available | 2 |
| Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters | May 30, 2024 | Red Teaming | —Unverified | 0 |