Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 401–425 of 1135 papers

Title	Date	Tasks	Status	Hype
Large Language Models as Evaluators for Recommendation Explanations	Jun 5, 2024	Common Sense ReasoningInstruction Following	CodeCode Available	1
Contrastive Vision-Language Alignment Makes Efficient Instruction Learner	Nov 29, 2023	Contrastive LearningImage Captioning	CodeCode Available	1
Language-Conditioned Reinforcement Learning to Solve Misunderstandings with Action Corrections	Nov 18, 2022	Instruction Followingreinforcement-learning	CodeCode Available	1
RuleRAG: Rule-guided retrieval-augmented generation with language models for question answering	Oct 15, 2024	In-Context LearningInstruction Following	CodeCode Available	1
Language Imbalance Driven Rewarding for Multilingual Self-improving	Oct 11, 2024	Arithmetic ReasoningInstruction Following	CodeCode Available	1
CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection	Oct 10, 2024	Instruction Following	CodeCode Available	1
Answer is All You Need: Instruction-following Text Embedding via Answering the Question	Feb 15, 2024	abstractive question answeringAll	CodeCode Available	1
Lana: A Language-Capable Navigator for Instruction Following and Generation	Mar 15, 2023	Instruction FollowingText Generation	CodeCode Available	1
Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-Translation	Jan 12, 2024	Instruction FollowingTranslation	CodeCode Available	1
An In-depth Look at Gemini's Language Abilities	Dec 18, 2023	Instruction FollowingMath	CodeCode Available	1
Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization	Aug 14, 2024	InformativenessInstruction Following	CodeCode Available	1
Don't Reinvent the Wheel: Efficient Instruction-Following Text Embedding based on Guided Space Transformation	May 30, 2025	Instruction Following	CodeCode Available	1
IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators	Mar 6, 2024	Code CompletionCode Generation	CodeCode Available	1
Is In-Context Learning Sufficient for Instruction Following in LLMs?	May 30, 2024	In-Context LearningInstruction Following	CodeCode Available	1
InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction	Mar 26, 2025	Instruction FollowingVideo Editing	CodeCode Available	1
Inferring Rewards from Language in Context	Apr 5, 2022	Instruction FollowingReinforcement Learning (RL)	CodeCode Available	1
BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues	Oct 20, 2023	Instruction Following	CodeCode Available	1
Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach	Jun 5, 2024	Image RetrievalInstruction Following	CodeCode Available	1
Do LLMs "know" internally when they follow instructions?	Oct 18, 2024	Instruction FollowingPrompt Engineering	CodeCode Available	1
Creative Agents: Empowering Agents with Imagination for Creative Tasks	Dec 5, 2023	Instruction FollowingLanguage Modelling	CodeCode Available	1
A Dual-Space Framework for General Knowledge Distillation of Large Language Models	Apr 15, 2025	Code GenerationGeneral Knowledge	CodeCode Available	1
InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems	Jun 19, 2025	BenchmarkingDescriptive	CodeCode Available	1
Investigating Instruction Tuning Large Language Models on Graphs	Aug 10, 2024	Instruction Following	CodeCode Available	1
Jatmo: Prompt Injection Defense by Task-Specific Finetuning	Dec 29, 2023	Instruction Following	CodeCode Available	1
Instruction Position Matters in Sequence Generation with Large Language Models	Aug 23, 2023	Instruction FollowingPosition	CodeCode Available	1

Show:10 25 50

← PrevPage 17 of 46Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified