| Offensive Security for AI Systems: Concepts, Practices, and Applications | May 9, 2025 | Red Teaming | —Unverified | 0 |
| AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents | May 9, 2025 | NavigateRed Teaming | —Unverified | 0 |
| Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods | May 8, 2025 | Red TeamingSystematic Literature Review | —Unverified | 0 |
| DMRL: Data- and Model-aware Reward Learning for Data Extraction | May 7, 2025 | Prompt EngineeringRed Teaming | —Unverified | 0 |
| Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs | May 7, 2025 | Red Teaming | —Unverified | 0 |
| Red Teaming Large Language Models for Healthcare | May 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| When Testing AI Tests Us: Safeguarding Mental Health on the Digital Frontlines | Apr 29, 2025 | Red Teaming | —Unverified | 0 |
| SAGE: A Generic Framework for LLM Safety Evaluation | Apr 28, 2025 | Red TeamingSafety Alignment | CodeCode Available | 0 |
| Understanding and Mitigating Risks of Generative AI in Financial Services | Apr 25, 2025 | FairnessRed Teaming | —Unverified | 0 |
| RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models | Apr 25, 2025 | RAGRed Teaming | —Unverified | 0 |
| ELAB: Extensive LLM Alignment Benchmark in Persian Language | Apr 17, 2025 | FairnessRed Teaming | —Unverified | 0 |
| X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents | Apr 15, 2025 | DiversityRed Teaming | —Unverified | 0 |
| The Structural Safety Generalization Problem | Apr 13, 2025 | Red Teaming | CodeCode Available | 0 |
| Multi-lingual Multi-turn Automated Red Teaming for LLMs | Apr 4, 2025 | Red Teaming | —Unverified | 0 |
| Strategize Globally, Adapt Locally: A Multi-Turn Red Teaming Agent with Dual-Level Learning | Apr 2, 2025 | Red Teaming | —Unverified | 0 |
| Red Teaming with Artificial Intelligence-Driven Cyberattacks: A Scoping Review | Mar 25, 2025 | ArticlesRed Teaming | —Unverified | 0 |
| AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration | Mar 20, 2025 | Red Teaming | —Unverified | 0 |
| MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models | Mar 19, 2025 | Adversarial RobustnessAutonomous Driving | —Unverified | 0 |
| Making Every Step Effective: Jailbreaking Large Vision-Language Models Through Hierarchical KV Equalization | Mar 14, 2025 | Red Teaming | —Unverified | 0 |
| A Framework for Evaluating Emerging Cyberattack Capabilities of AI | Mar 14, 2025 | Red Teaming | —Unverified | 0 |
| Red Teaming Contemporary AI Models: Insights from Spanish and Basque Perspectives | Mar 13, 2025 | Red Teaming | —Unverified | 0 |
| JBFuzz: Jailbreaking LLMs Efficiently and Effectively Using Fuzzing | Mar 12, 2025 | Red TeamingSafety Alignment | —Unverified | 0 |
| MAD-MAX: Modular And Diverse Malicious Attack MiXtures for Automated LLM Red Teaming | Mar 8, 2025 | Red Teaming | —Unverified | 0 |
| Reinforced Diffuser for Red Teaming Large Vision-Language Models | Mar 8, 2025 | Large Language ModelRed Teaming | —Unverified | 0 |
| Know Thy Judge: On the Robustness Meta-Evaluation of LLM Safety Judges | Mar 6, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |