| CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge | Apr 10, 2024 | Red Teaming | —Unverified | 0 |
| CURE: Concept Unlearning via Orthogonal Representation Editing in Diffusion Models | May 19, 2025 | BenchmarkingRed Teaming | —Unverified | 0 |
| DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial Natural Language Instructions | Dec 7, 2023 | Code GenerationRed Teaming | —Unverified | 0 |
| Desert Camels and Oil Sheikhs: Arab-Centric Red Teaming of Frontier LLMs | Oct 31, 2024 | Red Teaming | —Unverified | 0 |
| Atoxia: Red-teaming Large Language Models with Target Toxic Answers | Aug 27, 2024 | Prompt EngineeringRed Teaming | —Unverified | 0 |
| DiffZOO: A Purely Query-Based Black-Box Attack for Red-teaming Text-to-Image Generative Model via Zeroth Order Optimization | Aug 18, 2024 | Red Teaming | —Unverified | 0 |
| Digital cloning of online social networks for language-sensitive agent-based modeling of misinformation spread | Jan 23, 2024 | MisinformationRed Teaming | —Unverified | 0 |
| Direct Unlearning Optimization for Robust and Safe Text-to-Image Models | Jul 17, 2024 | Red Teaming | —Unverified | 0 |
| Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning | Dec 24, 2024 | DiversityLarge Language Model | —Unverified | 0 |
| DMRL: Data- and Model-aware Reward Learning for Data Extraction | May 7, 2025 | Prompt EngineeringRed Teaming | —Unverified | 0 |