| RedRFT: A Light-Weight Benchmark for Reinforcement Fine-Tuning-Based Red Teaming | Jun 4, 2025 | Red Teaming | CodeCode Available | 0 | 5 |
| SMILES-Prompting: A Novel Approach to LLM Jailbreak Attacks in Chemical Synthesis | Oct 21, 2024 | LLM JailbreakRed Teaming | CodeCode Available | 0 | 5 |
| BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage | Jun 3, 2025 | Prompt EngineeringRed Teaming | CodeCode Available | 0 | 5 |
| Bias patterns in the application of LLMs for clinical decision support: A comprehensive study | Apr 23, 2024 | Decision MakingQuestion Answering | CodeCode Available | 0 | 5 |
| RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource Languages | Jul 8, 2025 | Red Teaming | CodeCode Available | 0 | 5 |
| Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks Safety | May 11, 2025 | Outlier DetectionRed Teaming | CodeCode Available | 0 | 5 |
| Overriding Safety protections of Open-source Models | Sep 28, 2024 | Red TeamingSafety Alignment | CodeCode Available | 0 | 5 |
| Advancing Adversarial Suffix Transfer Learning on Aligned Large Language Models | Aug 27, 2024 | Red TeamingTransfer Learning | CodeCode Available | 0 | 5 |
| Aligners: Decoupling LLMs and Alignment | Mar 7, 2024 | Instruction FollowingRed Teaming | CodeCode Available | 0 | 5 |
| BiasJailbreak:Analyzing Ethical Biases and Jailbreak Vulnerabilities in Large Language Models | Oct 17, 2024 | Red TeamingSafety Alignment | CodeCode Available | 0 | 5 |