| CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge | Apr 10, 2024 | Red Teaming | —Unverified | 0 | 0 |
| JBFuzz: Jailbreaking LLMs Efficiently and Effectively Using Fuzzing | Mar 12, 2025 | Red TeamingSafety Alignment | —Unverified | 0 | 0 |
| KDA: A Knowledge-Distilled Attacker for Generating Diverse Prompts to Jailbreak LLMs | Feb 5, 2025 | DiversityPrompt Engineering | —Unverified | 0 | 0 |
| Know Thy Judge: On the Robustness Meta-Evaluation of LLM Safety Judges | Mar 6, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| LARGO: Latent Adversarial Reflection through Gradient Optimization for Jailbreaking LLMs | May 16, 2025 | Red Teaming | —Unverified | 0 | 0 |
| CTI4AI: Threat Intelligence Generation and Sharing after Red Teaming AI Models | Aug 16, 2022 | Red Teaming | —Unverified | 0 | 0 |
| CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring | May 29, 2025 | Red Teaming | —Unverified | 0 | 0 |
| Conversational Complexity for Assessing Risk in Large Language Models | Sep 2, 2024 | Red Teaming | —Unverified | 0 | 0 |
| Learning from Red Teaming: Gender Bias Provocation and Mitigation in Large Language Models | Oct 17, 2023 | In-Context LearningRed Teaming | —Unverified | 0 | 0 |
| Lessons From Red Teaming 100 Generative AI Products | Jan 13, 2025 | BenchmarkingRed Teaming | —Unverified | 0 | 0 |