Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 221–230 of 1135 papers

Title	Date	Tasks	Status	Hype
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization	Nov 15, 2023	BenchmarkingInstruction Following	CodeCode Available	1
A modular vision language navigation and manipulation framework for long horizon compositional tasks in indoor environment	Jan 19, 2021	Instruction FollowingVision-Language Navigation	CodeCode Available	1
Lana: A Language-Capable Navigator for Instruction Following and Generation	Mar 15, 2023	Instruction FollowingText Generation	CodeCode Available	1
AlpaGasus: Training A Better Alpaca with Fewer Data	Jul 17, 2023	Instruction Following	CodeCode Available	1
Language-Conditioned Reinforcement Learning to Solve Misunderstandings with Action Corrections	Nov 18, 2022	Instruction Followingreinforcement-learning	CodeCode Available	1
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models	Dec 8, 2024	Instruction FollowingNatural Language Understanding	CodeCode Available	1
DANLI: Deliberative Agent for Following Natural Language Instructions	Oct 22, 2022	Instruction FollowingVision-Language Navigation	CodeCode Available	1
AlpaCare:Instruction-tuned Large Language Models for Medical Application	Oct 23, 2023	DiversityInstruction Following	CodeCode Available	1
DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations	Oct 24, 2024	Instruction FollowingQuestion Answering	CodeCode Available	1
GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing	May 16, 2025	Instruction FollowingMultiple-choice	CodeCode Available	1

Show:10 25 50

← PrevPage 23 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified