SOTAVerified|Agents Browse Leaderboard About

Safety Alignment

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 288 papers

Title	Date	Tasks	Status	Hype
The Hidden Dimensions of LLM Alignment: A Multi-Dimensional Safety Analysis	Feb 13, 2025	Safety Alignment	CodeCode Available	3
Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey	Sep 26, 2024	Safety Alignment	CodeCode Available	3
The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs	Jul 15, 2025	Code GenerationSafety Alignment	CodeCode Available	2
PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks	May 20, 2025	LLM JailbreakSafety Alignment	CodeCode Available	2
Think Twice Before You Act: Enhancing Agent Behavioral Safety with Thought Correction	May 16, 2025	Contrastive LearningSafety Alignment	CodeCode Available	2
LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation	Apr 10, 2025	Code GenerationContinual Learning	CodeCode Available	2
STAIR: Improving Safety Alignment with Introspective Reasoning	Feb 4, 2025	Safety Alignment	CodeCode Available	2
Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation	Jan 29, 2025	Red TeamingSafety Alignment	CodeCode Available	2
Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues	Oct 14, 2024	LLM JailbreakSafety Alignment	CodeCode Available	2
Cross-Modality Safety Alignment	Jun 21, 2024	Safety Alignment	CodeCode Available	2

Show:10 25 50

← PrevPage 1 of 29Next →

No leaderboard results yet.