SOTAVerified

Safety Alignment

Papers

Showing 126150 of 288 papers

TitleStatusHype
CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning0
Shape it Up! Restoring LLM Safety during Finetuning0
Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question AnsweringCode0
"Haet Bhasha aur Diskrimineshun": Phonetic Perturbations in Code-Mixed Hinglish to Red-Team LLMs0
SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment0
sudoLLM : On Multi-role Alignment of Language Models0
Safety Alignment Can Be Not Superficial With Explicit Safety Signals0
SafeVid: Toward Safety Aligned Video Large Multimodal Models0
JULI: Jailbreak Large Language Models by Self-Introspection0
Noise Injection Systemically Degrades Large Language Model Safety Guardrails0
CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs0
Analysing Safety Risks in LLMs Fine-Tuned with Pseudo-Malicious Cyber Security Data0
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning0
One Trigger Token Is Enough: A Defense Strategy for Balancing Safety and Usability in Large Language Models0
Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks SafetyCode0
Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning ModelCode0
NeuRel-Attack: Neuron Relearning for Safety Disalignment in Large Language Models0
SAGE: A Generic Framework for LLM Safety EvaluationCode0
What's Pulling the Strings? Evaluating Integrity and Attribution in AI Training and Inference through Concept Shift0
AI Awareness0
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language ModelsCode0
aiXamine: Simplified LLM Safety and Security0
Towards NSFW-Free Text-to-Image Generation via Safety-Constraint Direct Preference Optimization0
Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models0
VLMGuard-R1: Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization0
Show:102550
← PrevPage 6 of 12Next →

No leaderboard results yet.