SOTAVerified

Red Teaming

Papers

Showing 8190 of 251 papers

TitleStatusHype
RedRFT: A Light-Weight Benchmark for Reinforcement Fine-Tuning-Based Red TeamingCode0
SMILES-Prompting: A Novel Approach to LLM Jailbreak Attacks in Chemical SynthesisCode0
BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream CamouflageCode0
Bias patterns in the application of LLMs for clinical decision support: A comprehensive studyCode0
RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource LanguagesCode0
Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks SafetyCode0
Overriding Safety protections of Open-source ModelsCode0
Advancing Adversarial Suffix Transfer Learning on Aligned Large Language ModelsCode0
Aligners: Decoupling LLMs and AlignmentCode0
BiasJailbreak:Analyzing Ethical Biases and Jailbreak Vulnerabilities in Large Language ModelsCode0
Show:102550
← PrevPage 9 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SUDOAttack Success Rate41Unverified