| Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models | Mar 3, 2025 | Red TeamingSurvey | —Unverified | 0 |
| Can Language Models be Instructed to Protect Personal Information? | Oct 3, 2023 | Adversarial RobustnessRed Teaming | —Unverified | 0 |
| Can Large Language Models Automatically Jailbreak GPT-4V? | Jul 23, 2024 | Face RecognitionIn-Context Learning | —Unverified | 0 |
| Can Large Language Models Change User Preference Adversarially? | Jan 5, 2023 | Red Teaming | —Unverified | 0 |
| CELL your Model: Contrastive Explanations for Large Language Models | Jun 17, 2024 | Red TeamingText Generation | —Unverified | 0 |
| Computational Red Teaming in a Sudoku Solving Context: Neural Network Based Skill Representation and Acquisition | Feb 27, 2018 | Red Teaming | —Unverified | 0 |
| Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming | Jan 31, 2025 | Red Teaming | —Unverified | 0 |
| Conversational Complexity for Assessing Risk in Large Language Models | Sep 2, 2024 | Red Teaming | —Unverified | 0 |
| CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring | May 29, 2025 | Red Teaming | —Unverified | 0 |
| CTI4AI: Threat Intelligence Generation and Sharing after Red Teaming AI Models | Aug 16, 2022 | Red Teaming | —Unverified | 0 |