| AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs | Apr 21, 2024 | MMLURed Teaming | CodeCode Available | 2 |
| CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge | Apr 10, 2024 | Red Teaming | —Unverified | 0 |
| ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming | Apr 6, 2024 | Adversarial RobustnessDialogue Safety Prediction | CodeCode Available | 2 |
| Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks? | Apr 4, 2024 | Red Teaming | CodeCode Available | 0 |
| Red-Teaming Segment Anything Model | Apr 2, 2024 | Image Segmentationmodel | CodeCode Available | 0 |
| Against The Achilles' Heel: A Survey on Red Teaming for Generative Models | Mar 31, 2024 | Red TeamingSurvey | CodeCode Available | 2 |
| Aurora-M: Open Source Continual Pre-training for Multilingual Language and Code | Mar 30, 2024 | Continual PretrainingLanguage Modelling | —Unverified | 0 |
| IterAlign: Iterative Constitutional Alignment of Large Language Models | Mar 27, 2024 | Red Teaming | —Unverified | 0 |
| HRLAIF: Improvements in Helpfulness and Harmlessness in Open-domain Reinforcement Learning From AI Feedback | Mar 13, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| Distract Large Language Models for Automatic Jailbreak Attack | Mar 13, 2024 | Red Teaming | CodeCode Available | 0 |