| Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training | Mar 24, 2025 | DiversityLarge Language Model | CodeCode Available | 1 |
| AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration | Mar 20, 2025 | Red Teaming | —Unverified | 0 |
| MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models | Mar 19, 2025 | Adversarial RobustnessAutonomous Driving | —Unverified | 0 |
| Making Every Step Effective: Jailbreaking Large Vision-Language Models Through Hierarchical KV Equalization | Mar 14, 2025 | Red Teaming | —Unverified | 0 |
| A Framework for Evaluating Emerging Cyberattack Capabilities of AI | Mar 14, 2025 | Red Teaming | —Unverified | 0 |
| Red Teaming Contemporary AI Models: Insights from Spanish and Basque Perspectives | Mar 13, 2025 | Red Teaming | —Unverified | 0 |
| JBFuzz: Jailbreaking LLMs Efficiently and Effectively Using Fuzzing | Mar 12, 2025 | Red TeamingSafety Alignment | —Unverified | 0 |
| MAD-MAX: Modular And Diverse Malicious Attack MiXtures for Automated LLM Red Teaming | Mar 8, 2025 | Red Teaming | —Unverified | 0 |
| Reinforced Diffuser for Red Teaming Large Vision-Language Models | Mar 8, 2025 | Large Language ModelRed Teaming | —Unverified | 0 |
| Know Thy Judge: On the Robustness Meta-Evaluation of LLM Safety Judges | Mar 6, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |