SOTAVerified

Red Teaming

Papers

Showing 231240 of 251 papers

TitleStatusHype
TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data SynthesisCode0
BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream CamouflageCode0
Distract Large Language Models for Automatic Jailbreak AttackCode0
Look Before You Leap: Enhancing Attention and Vigilance Regarding Harmful Content with GuidelineLLMCode0
Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks SafetyCode0
Kov: Transferable and Naturalistic Black-Box LLM Attacks using Markov Decision Processes and Tree SearchCode0
RICoTA: Red-teaming of In-the-wild Conversation with Test AttemptsCode0
InfoPattern: Unveiling Information Propagation Patterns in Social MediaCode0
Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal ModelsCode0
SAGE: A Generic Framework for LLM Safety EvaluationCode0
Show:102550
← PrevPage 24 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SUDOAttack Success Rate41Unverified