Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 541–550 of 1135 papers

Title	Date	Tasks	Status
ParamΔ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost	Apr 23, 2025	Instruction FollowingLanguage Modeling	—Unverified
Case Study: Fine-tuning Small Language Models for Accurate and Private CWE Detection in Python Code	Apr 23, 2025	Instruction FollowingPrivacy Preserving	—Unverified
DistilQwen2.5: Industrial Practices of Training Distilled Open Lightweight Language Models	Apr 21, 2025	Computational EfficiencyInstruction Following	—Unverified
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators	Apr 21, 2025	Code GenerationInstruction Following	CodeCode Available
Improving Instruct Models for Free: A Study on Partial Adaptation	Apr 15, 2025	Few-Shot LearningIn-Context Learning	—Unverified
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning	Apr 12, 2025	Instruction Following	—Unverified
Playpen: An Environment for Exploring Learning Through Conversational Interaction	Apr 11, 2025	Instruction FollowingLarge Language Model	CodeCode Available
VideoExpert: Augmented LLM for Temporal-Sensitive Video Understanding	Apr 10, 2025	Instruction FollowingVideo Understanding	—Unverified
Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models	Apr 10, 2025	Instruction Following	—Unverified
Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models	Apr 9, 2025	Instruction FollowingMathematical Problem-Solving	—Unverified

Show:10 25 50

← PrevPage 55 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified