| Effective Red-Teaming of Policy-Adherent Agents | Jun 11, 2025 | Red Teaming | —Unverified | 0 |
| ELAB: Extensive LLM Alignment Benchmark in Persian Language | Apr 17, 2025 | FairnessRed Teaming | —Unverified | 0 |
| Embodied Red Teaming for Auditing Robotic Foundation Models | Nov 27, 2024 | Red Teaming | —Unverified | 0 |
| EVA: Red-Teaming GUI Agents via Evolving Indirect Prompt Injection | May 20, 2025 | Red Teaming | —Unverified | 0 |
| Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity | Jan 30, 2023 | EthicsLanguage Modelling | —Unverified | 0 |
| Exploring Straightforward Conversational Red-Teaming | Sep 7, 2024 | Red Teaming | —Unverified | 0 |
| Exploring the Vulnerability of the Content Moderation Guardrail in Large Language Models via Intent Manipulation | May 24, 2025 | Intent DetectionNatural Language Understanding | —Unverified | 0 |
| Fast Proxies for LLM Robustness Evaluation | Feb 14, 2025 | Red Teaming | —Unverified | 0 |
| Finding Safety Neurons in Large Language Models | Jun 20, 2024 | MisinformationRed Teaming | —Unverified | 0 |
| FLIRT: Feedback Loop In-context Red Teaming | Aug 8, 2023 | In-Context LearningRed Teaming | —Unverified | 0 |