SOTAVerified

Safety Alignment

Papers

Showing 6170 of 288 papers

TitleStatusHype
QueryAttack: Jailbreaking Aligned Large Language Models Using Structured Non-natural Query LanguageCode1
Don't Say No: Jailbreaking LLM by Suppressing RefusalCode1
Bayesian scaling laws for in-context learningCode1
MLLM-Protector: Ensuring MLLM's Safety without Hurting PerformanceCode1
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual PromptsCode1
Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less ReasonableCode1
BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference DatasetCode1
Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!Code1
Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time AlignmentCode1
MPO: Multilingual Safety Alignment via Reward Gap OptimizationCode1
Show:102550
← PrevPage 7 of 29Next →

No leaderboard results yet.