Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 301–350 of 1135 papers

Title	Date	Tasks	Status	Hype	Score
Instruction Position Matters in Sequence Generation with Large Language Models	Aug 23, 2023	Instruction FollowingPosition	CodeCode Available	1	5
InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction	Mar 26, 2025	Instruction FollowingVideo Editing	CodeCode Available	1	5
AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data	May 29, 2024	Code GenerationDiversity	CodeCode Available	1	5
Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness	Jan 14, 2025	Event ExtractionInstruction Following	CodeCode Available	1	5
Mosaic-IT: Free Compositional Data Augmentation Improves Instruction Tuning	May 22, 2024	Data AugmentationDiversity	CodeCode Available	1	5
NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models	Mar 4, 2024	Instruction Following	CodeCode Available	1	5
Instruct and Extract: Instruction Tuning for On-Demand Information Extraction	Oct 24, 2023	Instruction Following	CodeCode Available	1	5
Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement	Aug 6, 2024	Code GenerationDisentanglement	CodeCode Available	1	5
MoDS: Model-oriented Data Selection for Instruction Tuning	Nov 27, 2023	Instruction Followingmodel	CodeCode Available	1	5
Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages	Oct 7, 2023	Instruction Following	CodeCode Available	1	5
Factorizing Perception and Policy for Interactive Instruction Following	Dec 6, 2020	Instruction FollowingNavigate	CodeCode Available	1	5
Monte Carlo Thought Search: Large Language Model Querying for Complex Scientific Reasoning in Catalyst Design	Oct 22, 2023	Computational chemistryInstruction Following	CodeCode Available	1	5
InfMLLM: A Unified Framework for Visual-Language Tasks	Nov 12, 2023	GPUImage Captioning	CodeCode Available	1	5
Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following	Feb 28, 2023	Instruction FollowingZero-shot Generalization	CodeCode Available	1	5
Instruction-Following Agents with Multimodal Transformer	Oct 24, 2022	Instruction FollowingVisual Grounding	CodeCode Available	1	5
AceGPT, Localizing Large Language Models in Arabic	Sep 21, 2023	Instruction FollowingLanguage Modeling	CodeCode Available	1	5
ChatGPT may Pass the Bar Exam soon, but has a Long Way to Go for the LexGLUE benchmark	Mar 9, 2023	Instruction Following	CodeCode Available	1	5
Improving Translation Faithfulness of Large Language Models via Augmenting Instructions	Aug 24, 2023	Instruction FollowingMachine Translation	CodeCode Available	1	5
EventHallusion: Diagnosing Event Hallucinations in Video LLMs	Sep 25, 2024	HallucinationInstruction Following	CodeCode Available	1	5
Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models	Jun 2, 2025	Instruction FollowingReinforcement Learning (RL)	CodeCode Available	1	5
Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach	Jun 5, 2024	Image RetrievalInstruction Following	CodeCode Available	1	5
MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios	Sep 24, 2024	Instruction Following	CodeCode Available	1	5
Ex3: Automatic Novel Writing by Extracting, Excelsior and Expanding	Aug 16, 2024	Instruction Following	CodeCode Available	1	5
Agri-LLaVA: Knowledge-Infused Large Multimodal Assistant on Agricultural Pests and Diseases	Dec 3, 2024	Instruction Following	CodeCode Available	1	5
Infer Human's Intentions Before Following Natural Language Instructions	Sep 26, 2024	Instruction Following	CodeCode Available	1	5
Inferring Rewards from Language in Context	Apr 5, 2022	Instruction FollowingReinforcement Learning (RL)	CodeCode Available	1	5
IDA-Bench: Evaluating LLMs on Interactive Guided Data Analysis	May 23, 2025	Instruction Following	CodeCode Available	1	5
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates	Jun 17, 2024	Instruction FollowingSafety Alignment	CodeCode Available	1	5
MMIDR: Teaching Large Language Model to Interpret Multimodal Misinformation via Knowledge Distillation	Mar 21, 2024	Data AugmentationDecision Making	CodeCode Available	1	5
Evaluating LLMs at Detecting Errors in LLM Responses	Apr 4, 2024	Instruction Following	CodeCode Available	1	5
Evaluating Large Language Models at Evaluating Instruction Following	Oct 11, 2023	Instruction Following	CodeCode Available	1	5
IHEval: Evaluating Language Models on Following the Instruction Hierarchy	Feb 12, 2025	Instruction Following	CodeCode Available	1	5
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios	May 22, 2025	BenchmarkingInstruction Following	CodeCode Available	1	5
Facial Affective Behavior Analysis with Instruction Tuning	Apr 7, 2024	Emotion RecognitionInstruction Following	CodeCode Available	1	5
Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering	Jul 31, 2023	Instruction FollowingQuestion Answering	CodeCode Available	1	5
Hybrid Alignment Training for Large Language Models	Jun 21, 2024	Instruction Following	CodeCode Available	1	5
FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models	Nov 2, 2023	DescriptiveInstruction Following	CodeCode Available	1	5
Instruction-Tuning Data Synthesis from Scratch via Web Reconstruction	Apr 22, 2025	DiversityDomain Adaptation	CodeCode Available	1	5
MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning	Feb 27, 2024	DiversityInstruction Following	CodeCode Available	1	5
InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4	Aug 23, 2023	Instruction FollowingQuestion Answering	CodeCode Available	1	5
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs	Jul 1, 2024	Instruction Following	CodeCode Available	1	5
Instruction-Guided Visual Masking	May 30, 2024	Instruction FollowingVisual Grounding	CodeCode Available	1	5
Few-shot Object Grounding and Mapping for Natural Language Robot Instruction Following	Nov 14, 2020	continuous-controlContinuous Control	CodeCode Available	1	5
Alexa Arena: A User-Centric Interactive Platform for Embodied AI	Mar 2, 2023	Instruction Following	CodeCode Available	1	5
ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning	Mar 14, 2024	Chart UnderstandingInstruction Following	CodeCode Available	1	5
Finding Blind Spots in Evaluator LLMs with Interpretable Checklists	Jun 19, 2024	Instruction FollowingText Generation	CodeCode Available	1	5
M-IFEval: Multilingual Instruction-Following Evaluation	Feb 7, 2025	Instruction Following	CodeCode Available	1	5
InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems	Jun 19, 2025	BenchmarkingDescriptive	CodeCode Available	1	5
DocLens: Multi-aspect Fine-grained Evaluation for Medical Text Generation	Nov 16, 2023	Decision MakingInstruction Following	CodeCode Available	1	5
Enhancing Cross-Tokenizer Knowledge Distillation with Contextual Dynamical Mapping	Feb 16, 2025	Code GenerationInstruction Following	CodeCode Available	1	5

Show:10 25 50

← PrevPage 7 of 23Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified