SOTAVerified|Agents Browse Leaderboard About

Safety Alignment

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–60 of 288 papers

Title	Date	Tasks	Status	Hype	Score
Don't Say No: Jailbreaking LLM by Suppressing Refusal	Apr 25, 2024	Natural Language InferenceSafety Alignment	CodeCode Available	1	5
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization	Oct 5, 2023	AllLanguage Modeling	CodeCode Available	1	5
Locking Down the Finetuned LLMs Safety	Oct 14, 2024	Safety Alignment	CodeCode Available	1	5
Lisa: Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning Attack	May 28, 2024	Safety Alignment	CodeCode Available	1	5
AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models	Oct 23, 2023	Adversarial AttackBlocking	CodeCode Available	1	5
DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt	Jun 11, 2025	Safety Alignment	CodeCode Available	1	5
Autonomous Microscopy Experiments through Large Language Model Agents	Dec 18, 2024	BenchmarkingExperimental Design	CodeCode Available	1	5
Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment	Nov 27, 2024	Safety AlignmentVisual Reasoning	CodeCode Available	1	5
Trojan Activation Attack: Red-Teaming Large Language Models using Activation Steering for Safety-Alignment	Nov 15, 2023	Red TeamingSafety Alignment	CodeCode Available	1	5
QueryAttack: Jailbreaking Aligned Large Language Models Using Structured Non-natural Query Language	Feb 13, 2025	Safety Alignment	CodeCode Available	1	5

Show:10 25 50

← PrevPage 6 of 29Next →

No leaderboard results yet.