| Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming | Jun 17, 2024 | DiversityRed Teaming | —Unverified | 0 |
| Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding | Jun 17, 2024 | 16kLanguage Modelling | CodeCode Available | 0 |
| CELL your Model: Contrastive Explanations for Large Language Models | Jun 17, 2024 | Red TeamingText Generation | —Unverified | 0 |
| STAR: SocioTechnical Approach to Red Teaming Language Models | Jun 17, 2024 | Red Teaming | —Unverified | 0 |
| Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters | May 30, 2024 | Red Teaming | —Unverified | 0 |
| Safety Alignment for Vision Language Models | May 22, 2024 | Red TeamingSafety Alignment | —Unverified | 0 |
| Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model Against LLM Red-Teaming | May 21, 2024 | Red Teaming | —Unverified | 0 |
| Red Teaming Language Models for Processing Contradictory Dialogues | May 16, 2024 | Red Teamingvalid | CodeCode Available | 0 |
| A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI | Apr 23, 2024 | Prompt EngineeringRed Teaming | —Unverified | 0 |
| Bias patterns in the application of LLMs for clinical decision support: A comprehensive study | Apr 23, 2024 | Decision MakingQuestion Answering | CodeCode Available | 0 |