| Offensive Security for AI Systems: Concepts, Practices, and Applications | May 9, 2025 | Red Teaming | —Unverified | 0 |
| AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents | May 9, 2025 | NavigateRed Teaming | —Unverified | 0 |
| Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods | May 8, 2025 | Red TeamingSystematic Literature Review | —Unverified | 0 |
| DMRL: Data- and Model-aware Reward Learning for Data Extraction | May 7, 2025 | Prompt EngineeringRed Teaming | —Unverified | 0 |
| Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs | May 7, 2025 | Red Teaming | —Unverified | 0 |
| Red Teaming Large Language Models for Healthcare | May 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| When Testing AI Tests Us: Safeguarding Mental Health on the Digital Frontlines | Apr 29, 2025 | Red Teaming | —Unverified | 0 |
| SAGE: A Generic Framework for LLM Safety Evaluation | Apr 28, 2025 | Red TeamingSafety Alignment | CodeCode Available | 0 |
| Understanding and Mitigating Risks of Generative AI in Financial Services | Apr 25, 2025 | FairnessRed Teaming | —Unverified | 0 |
| RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models | Apr 25, 2025 | RAGRed Teaming | —Unverified | 0 |