| RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource Languages | Jul 8, 2025 | Red Teaming | CodeCode Available | 0 |
| STACK: Adversarial Attacks on LLM Safeguard Pipelines | Jun 30, 2025 | Red Teaming | —Unverified | 0 |
| We Should Identify and Mitigate Third-Party Safety Risks in MCP-Powered Agent Systems | Jun 16, 2025 | PositionRed Teaming | CodeCode Available | 0 |
| Effective Red-Teaming of Policy-Adherent Agents | Jun 11, 2025 | Red Teaming | —Unverified | 0 |
| GenBreak: Red Teaming Text-to-Image Generators Using Large Language Models | Jun 11, 2025 | Large Language ModelRed Teaming | —Unverified | 0 |
| Quality-Diversity Red-Teaming: Automated Generation of High-Quality and Diverse Attackers for Large Language Models | Jun 8, 2025 | DiversityRed Teaming | —Unverified | 0 |
| RedDebate: Safer Responses through Multi-Agent Red Teaming Debates | Jun 4, 2025 | Red Teaming | CodeCode Available | 0 |
| RedRFT: A Light-Weight Benchmark for Reinforcement Fine-Tuning-Based Red Teaming | Jun 4, 2025 | Red Teaming | CodeCode Available | 0 |
| BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage | Jun 3, 2025 | Prompt EngineeringRed Teaming | CodeCode Available | 0 |
| Red Teaming AI Policy: A Taxonomy of Avoision and the EU AI Act | Jun 2, 2025 | Red Teaming | —Unverified | 0 |