| Leveraging Reinforcement Learning in Red Teaming for Advanced Ransomware Attack Simulations | Jun 25, 2024 | Red TeamingReinforcement Learning (RL) | —Unverified | 0 | 0 |
| LLM-Assisted Red Teaming of Diffusion Models through "Failures Are Fated, But Can Be Faded" | Oct 22, 2024 | Deep Reinforcement LearningRed Teaming | —Unverified | 0 | 0 |
| Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming | Jan 31, 2025 | Red Teaming | —Unverified | 0 | 0 |
| LLM-Safety Evaluations Lack Robustness | Mar 4, 2025 | Red TeamingResponse Generation | —Unverified | 0 | 0 |
| LLMStinger: Jailbreaking LLMs using RL fine-tuned LLMs | Nov 13, 2024 | Prompt EngineeringRed Teaming | —Unverified | 0 | 0 |
| The Human Factor in AI Red Teaming: Perspectives from Social and Collaborative Computing | Jul 10, 2024 | FairnessRed Teaming | —Unverified | 0 | 0 |
| LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B | Oct 31, 2023 | GPURed Teaming | —Unverified | 0 | 0 |
| Low-Resource Languages Jailbreak GPT-4 | Oct 3, 2023 | Red Teaming | —Unverified | 0 | 0 |
| MAD-MAX: Modular And Diverse Malicious Attack MiXtures for Automated LLM Red Teaming | Mar 8, 2025 | Red Teaming | —Unverified | 0 | 0 |
| Making Every Step Effective: Jailbreaking Large Vision-Language Models Through Hierarchical KV Equalization | Mar 14, 2025 | Red Teaming | —Unverified | 0 | 0 |