Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 551–560 of 1135 papers

Title	Date	Tasks	Status
From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models	Apr 8, 2025	In-Context LearningInstruction Following	—Unverified
Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations	Apr 8, 2025	Instruction FollowingMixture-of-Experts	—Unverified
Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators	Apr 8, 2025	Instruction Following	—Unverified
The Hidden Space of Safety: Understanding Preference-Tuned LLMs in Multilingual context	Apr 3, 2025	Instruction Following	—Unverified
Pay More Attention to the Robustness of Prompt for Instruction Data Mining	Mar 31, 2025	Instruction Following	—Unverified
Effectively Controlling Reasoning Models through Thinking Intervention	Mar 31, 2025	Instruction FollowingSafety Alignment	—Unverified
Learning to Instruct for Visual Instruction Tuning	Mar 28, 2025	HallucinationInstruction Following	—Unverified
Gemma 3 Technical Report	Mar 25, 2025	Instruction FollowingMath	—Unverified
OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence	Mar 20, 2025	Instruction FollowingNatural Language Understanding	—Unverified
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings	Mar 19, 2025	Instruction FollowingLarge Language Model	CodeCode Available

Show:10 25 50

← PrevPage 56 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified