| A Formal Framework for Assessing and Mitigating Emergent Security Risks in Generative AI Models: Bridging Theory and Dynamic Risk Mitigation | Oct 15, 2024 | Anomaly DetectionRed Teaming | —Unverified | 0 | 0 |
| AdvAgent: Controllable Blackbox Red-teaming on Web Agents | Oct 22, 2024 | Decision MakingRed Teaming | —Unverified | 0 | 0 |
| Understanding and Mitigating Risks of Generative AI in Financial Services | Apr 25, 2025 | FairnessRed Teaming | —Unverified | 0 | 0 |
| Adversaries Can Misuse Combinations of Safe Models | Jun 20, 2024 | Red Teaming | —Unverified | 0 | 0 |
| STACK: Adversarial Attacks on LLM Safeguard Pipelines | Jun 30, 2025 | Red Teaming | —Unverified | 0 | 0 |
| STAR: SocioTechnical Approach to Red Teaming Language Models | Jun 17, 2024 | Red Teaming | —Unverified | 0 | 0 |
| AART: AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered Applications | Nov 14, 2023 | DiversityRed Teaming | —Unverified | 0 | 0 |
| SteerDiff: Steering towards Safe Text-to-Image Diffusion Models | Oct 3, 2024 | Image GenerationRed Teaming | —Unverified | 0 | 0 |
| VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment | Oct 12, 2024 | DiversityHallucination | —Unverified | 0 | 0 |
| Strategize Globally, Adapt Locally: A Multi-Turn Red Teaming Agent with Dual-Level Learning | Apr 2, 2025 | Red Teaming | —Unverified | 0 | 0 |