Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 426–450 of 1135 papers

Title	Date	Tasks	Status	Hype	Score
Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following	Feb 28, 2023	Instruction FollowingZero-shot Generalization	CodeCode Available	1	5
BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues	Oct 20, 2023	Instruction Following	CodeCode Available	1	5
Do LLMs "know" internally when they follow instructions?	Oct 18, 2024	Instruction FollowingPrompt Engineering	CodeCode Available	1	5
A Dual-Space Framework for General Knowledge Distillation of Large Language Models	Apr 15, 2025	Code GenerationGeneral Knowledge	CodeCode Available	1	5
An Emulator for Fine-Tuning Large Language Models using Small Language Models	Oct 19, 2023	Instruction Following	CodeCode Available	1	5
Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study	Jul 16, 2023	In-Context LearningInstruction Following	CodeCode Available	1	5
Bactrian-X: Multilingual Replicable Instruction-Following Models with Low-Rank Adaptation	May 24, 2023	Instruction Following	CodeCode Available	1	5
PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning	Feb 20, 2024	Instruction FollowingKnowledge Distillation	CodeCode Available	1	5
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning	Oct 23, 2024	Image CaptioningInstruction Following	CodeCode Available	1	5
Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement	Sep 17, 2024	Active LearningDiversity	CodeCode Available	1	5
IDA-Bench: Evaluating LLMs on Interactive Guided Data Analysis	May 23, 2025	Instruction Following	CodeCode Available	1	5
Instructive Decoding: Instruction-Tuned Large Language Models are Self-Refiner from Noisy Instructions	Nov 1, 2023	Few-Shot NLIInstruction Following	CodeCode Available	1	5
IHEval: Evaluating Language Models on Following the Instruction Hierarchy	Feb 12, 2025	Instruction Following	CodeCode Available	1	5
Infer Human's Intentions Before Following Natural Language Instructions	Sep 26, 2024	Instruction Following	CodeCode Available	1	5
Curiosity-Driven Reinforcement Learning from Human Feedback	Jan 20, 2025	DiversityInstruction Following	CodeCode Available	1	5
ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments	Jul 26, 2024	Instruction Following	CodeCode Available	1	5
"No, to the Right" -- Online Language Corrections for Robotic Manipulation via Shared Autonomy	Jan 6, 2023	Instruction Following	CodeCode Available	1	5
RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale	Jun 24, 2024	Code GenerationHumanEval	CodeCode Available	1	5
STRICT: Stress Test of Rendering Images Containing Text	May 25, 2025	Image GenerationInstruction Following	CodeCode Available	1	5
OmniGenBench: A Benchmark for Omnipotent Multimodal Generation across 50+ Tasks	May 24, 2025	Image GenerationInstruction Following	CodeCode Available	1	5
Online Continual Learning For Interactive Instruction Following Agents	Mar 12, 2024	Continual LearningIncremental Learning	CodeCode Available	1	5
Preference-Guided Reflective Sampling for Aligning Language Models	Aug 22, 2024	Document SummarizationInstruction Following	CodeCode Available	0	5
Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models	Oct 31, 2024	Instruction FollowingReranking	CodeCode Available	0	5
Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction	May 22, 2024	Instruction Following	CodeCode Available	0	5
Analysis of Language Change in Collaborative Instruction Following	Sep 9, 2021	Instruction Following	CodeCode Available	0	5

Show:10 25 50

← PrevPage 18 of 46Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified