SOTAVerified|Agents Browse Leaderboard About

Safety Alignment

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 41–50 of 288 papers

Title	Date	Tasks	Status	Hype	Score
Locking Down the Finetuned LLMs Safety	Oct 14, 2024	Safety Alignment	CodeCode Available	1	5
Can Editing LLMs Inject Harm?	Jul 29, 2024	FairnessGeneral Knowledge	CodeCode Available	1	5
QueryAttack: Jailbreaking Aligned Large Language Models Using Structured Non-natural Query Language	Feb 13, 2025	Safety Alignment	CodeCode Available	1	5
Don't Say No: Jailbreaking LLM by Suppressing Refusal	Apr 25, 2024	Natural Language InferenceSafety Alignment	CodeCode Available	1	5
Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment	Nov 27, 2024	Safety AlignmentVisual Reasoning	CodeCode Available	1	5
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates	Jun 17, 2024	Instruction FollowingSafety Alignment	CodeCode Available	1	5
All Languages Matter: On the Multilingual Safety of Large Language Models	Oct 2, 2023	AllSafety Alignment	CodeCode Available	1	5
Improving LLM Safety Alignment with Dual-Objective Optimization	Mar 5, 2025	Safety Alignment	CodeCode Available	1	5
DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt	Jun 11, 2025	Safety Alignment	CodeCode Available	1	5
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization	Oct 5, 2023	AllLanguage Modeling	CodeCode Available	1	5

Show:10 25 50

← PrevPage 5 of 29Next →

No leaderboard results yet.