Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 411–420 of 1135 papers

Title	Date	Tasks	Status	Hype
InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction	Mar 26, 2025	Instruction FollowingVideo Editing	CodeCode Available	1
Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach	Jun 5, 2024	Image RetrievalInstruction Following	CodeCode Available	1
Is In-Context Learning Sufficient for Instruction Following in LLMs?	May 30, 2024	In-Context LearningInstruction Following	CodeCode Available	1
BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues	Oct 20, 2023	Instruction Following	CodeCode Available	1
INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval Models	Feb 22, 2024	Information RetrievalInstruction Following	CodeCode Available	1
Do LLMs "know" internally when they follow instructions?	Oct 18, 2024	Instruction FollowingPrompt Engineering	CodeCode Available	1
A Dual-Space Framework for General Knowledge Distillation of Large Language Models	Apr 15, 2025	Code GenerationGeneral Knowledge	CodeCode Available	1
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data	Nov 22, 2023	Attributecounterfactual	CodeCode Available	1
Guiding Multi-Step Rearrangement Tasks with Natural Language Instructions	Nov 8, 2021	Instruction Following	CodeCode Available	1
InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems	Jun 19, 2025	BenchmarkingDescriptive	CodeCode Available	1

Show:10 25 50

← PrevPage 42 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified