| ELAB: Extensive LLM Alignment Benchmark in Persian Language | Apr 17, 2025 | FairnessRed Teaming | —Unverified | 0 |
| Embodied Red Teaming for Auditing Robotic Foundation Models | Nov 27, 2024 | Red Teaming | —Unverified | 0 |
| EVA: Red-Teaming GUI Agents via Evolving Indirect Prompt Injection | May 20, 2025 | Red Teaming | —Unverified | 0 |
| Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity | Jan 30, 2023 | EthicsLanguage Modelling | —Unverified | 0 |
| Exploring Straightforward Conversational Red-Teaming | Sep 7, 2024 | Red Teaming | —Unverified | 0 |
| Exploring the Vulnerability of the Content Moderation Guardrail in Large Language Models via Intent Manipulation | May 24, 2025 | Intent DetectionNatural Language Understanding | —Unverified | 0 |
| Fast Proxies for LLM Robustness Evaluation | Feb 14, 2025 | Red Teaming | —Unverified | 0 |
| Finding Safety Neurons in Large Language Models | Jun 20, 2024 | MisinformationRed Teaming | —Unverified | 0 |
| FLIRT: Feedback Loop In-context Red Teaming | Aug 8, 2023 | In-Context LearningRed Teaming | —Unverified | 0 |
| Games for AI Control: Models of Safety Evaluations of AI Deployment Protocols | Sep 12, 2024 | Decision MakingRed Teaming | —Unverified | 0 |
| GhostPrompt: Jailbreaking Text-to-image Generative Models based on Dynamic Optimization | May 25, 2025 | Large Language ModelRed Teaming | —Unverified | 0 |
| h4rm3l: A language for Composable Jailbreak Attack Synthesis | Aug 9, 2024 | BenchmarkingProgram Synthesis | —Unverified | 0 |
| "Haet Bhasha aur Diskrimineshun": Phonetic Perturbations in Code-Mixed Hinglish to Red-Team LLMs | May 20, 2025 | Image GenerationRed Teaming | —Unverified | 0 |
| Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents | May 20, 2025 | Contrastive LearningRed Teaming | —Unverified | 0 |
| HRLAIF: Improvements in Helpfulness and Harmlessness in Open-domain Reinforcement Learning From AI Feedback | Mar 13, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| In-Context Experience Replay Facilitates Safety Red-Teaming of Text-to-Image Diffusion Models | Nov 25, 2024 | Red TeamingSemantic Similarity | —Unverified | 0 |
| Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis | Oct 21, 2024 | Red Teaming | —Unverified | 0 |
| Investigating Bias Representations in Llama 2 Chat via Activation Steering | Feb 1, 2024 | Decision MakingRed Teaming | —Unverified | 0 |
| IterAlign: Iterative Constitutional Alignment of Large Language Models | Mar 27, 2024 | Red Teaming | —Unverified | 0 |
| JAB: Joint Adversarial Prompting and Belief Augmentation | Nov 16, 2023 | Red Teaming | —Unverified | 0 |
| Jailbreaking GPT-4V via Self-Adversarial Attacks with System Prompts | Nov 15, 2023 | Adversarial AttackRed Teaming | —Unverified | 0 |
| Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters | May 30, 2024 | Red Teaming | —Unverified | 0 |
| Jailbreaking Large Language Models with Symbolic Mathematics | Sep 17, 2024 | Red Teaming | —Unverified | 0 |
| Red Teaming AI Policy: A Taxonomy of Avoision and the EU AI Act | Jun 2, 2025 | Red Teaming | —Unverified | 0 |
| Red Teaming Contemporary AI Models: Insights from Spanish and Basque Perspectives | Mar 13, 2025 | Red Teaming | —Unverified | 0 |
| Red-Teaming for Generative AI: Silver Bullet or Security Theater? | Jan 29, 2024 | Red Teaming | —Unverified | 0 |
| Red Teaming Generative AI/NLP, the BB84 quantum cryptography protocol and the NIST-approved Quantum-Resistant Cryptographic Algorithms | Sep 17, 2023 | Red Teaming | —Unverified | 0 |
| Red Teaming Large Language Models for Healthcare | May 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Red Teaming Models for Hyperspectral Image Analysis Using Explainable AI | Mar 12, 2024 | Hyperspectral image analysisHYPERVIEW Challenge | —Unverified | 0 |
| Red-Teaming Text-to-Image Systems by Rule-based Preference Modeling | May 27, 2025 | Red Teaming | —Unverified | 0 |
| Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs | May 7, 2025 | Red Teaming | —Unverified | 0 |
| Red-Teaming the Stable Diffusion Safety Filter | Oct 3, 2022 | Image GenerationRed Teaming | —Unverified | 0 |
| Red Teaming Visual Language Models | Jan 23, 2024 | FairnessRed Teaming | —Unverified | 0 |
| Red Teaming with Artificial Intelligence-Driven Cyberattacks: A Scoping Review | Mar 25, 2025 | ArticlesRed Teaming | —Unverified | 0 |
| Reinforced Diffuser for Red Teaming Large Vision-Language Models | Mar 8, 2025 | Large Language ModelRed Teaming | —Unverified | 0 |
| RRTL: Red Teaming Reasoning Large Language Models in Tool Learning | May 21, 2025 | Red Teaming | —Unverified | 0 |
| Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming | Jun 17, 2024 | DiversityRed Teaming | —Unverified | 0 |
| SafeCOMM: What about Safety Alignment in Fine-Tuned Telecom Large Language Models? | May 29, 2025 | DiagnosticRed Teaming | —Unverified | 0 |
| Safety Alignment for Vision Language Models | May 22, 2024 | Red TeamingSafety Alignment | —Unverified | 0 |
| Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods | May 8, 2025 | Red TeamingSystematic Literature Review | —Unverified | 0 |
| SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming | Aug 14, 2024 | Red TeamingSafety Alignment | —Unverified | 0 |
| Seeing Seeds Beyond Weeds: Green Teaming Generative AI for Beneficial Uses | May 30, 2023 | Red Teaming | —Unverified | 0 |
| Shaping Influence and Influencing Shaping: A Computational Red Teaming Trust-based Swarm Intelligence Model | Feb 26, 2018 | Red Teaming | —Unverified | 0 |
| STACK: Adversarial Attacks on LLM Safeguard Pipelines | Jun 30, 2025 | Red Teaming | —Unverified | 0 |
| STAR: SocioTechnical Approach to Red Teaming Language Models | Jun 17, 2024 | Red Teaming | —Unverified | 0 |
| SteerDiff: Steering towards Safe Text-to-Image Diffusion Models | Oct 3, 2024 | Image GenerationRed Teaming | —Unverified | 0 |
| Strategize Globally, Adapt Locally: A Multi-Turn Red Teaming Agent with Dual-Level Learning | Apr 2, 2025 | Red Teaming | —Unverified | 0 |
| Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming | Nov 10, 2023 | Red Teaming | —Unverified | 0 |
| Testing and Evaluation of Large Language Models: Correctness, Non-Toxicity, and Fairness | Aug 31, 2024 | FairnessLanguage Modeling | —Unverified | 0 |
| Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints | Jan 14, 2025 | Large Language ModelRed Teaming | —Unverified | 0 |