| Ring-A-Bell! How Reliable are Concept Removal Methods for Diffusion Models? | Oct 16, 2023 | Red Teaming | CodeCode Available | 1 |
| Large Language Model Unlearning | Oct 14, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation | Oct 10, 2023 | Red Teaming | CodeCode Available | 1 |
| Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts | Sep 12, 2023 | Red TeamingText-to-Image Generation | CodeCode Available | 1 |
| Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment | Aug 18, 2023 | MMLURed Teaming | CodeCode Available | 1 |
| XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models | Aug 2, 2023 | Language ModellingRed Teaming | CodeCode Available | 1 |
| Jailbroken: How Does LLM Safety Training Fail? | Jul 5, 2023 | Red Teaming | CodeCode Available | 1 |
| Explore, Establish, Exploit: Red Teaming Language Models from Scratch | Jun 15, 2023 | Red Teaming | CodeCode Available | 1 |
| Red Teaming Language Model Detectors with Language Models | May 31, 2023 | Adversarial RobustnessLanguage Modeling | CodeCode Available | 1 |
| Query-Efficient Black-Box Red Teaming via Bayesian Optimization | May 27, 2023 | Bayesian OptimizationLanguage Modeling | CodeCode Available | 1 |