SOTAVerified|Agents Browse Leaderboard About

Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 441–450 of 1135 papers

Title	Date	Tasks	Status	Hype
RMM: A Recursive Mental Model for Dialogue Navigation	Nov 1, 2020	Answer GenerationInstruction Following	CodeCode Available	1
AllenAct: A Framework for Embodied AI Research	Aug 28, 2020	Deep Reinforcement LearningEmbodied Question Answering	CodeCode Available	1
RMM: A Recursive Mental Model for Dialog Navigation	May 2, 2020	Answer GenerationInstruction Following	CodeCode Available	1
Zero-Shot Compositional Policy Learning via Language Grounding	Apr 15, 2020	DescriptiveDomain Adaptation	CodeCode Available	1
Learning to Map Natural Language Instructions to Physical Quadcopter Control using Simulated Flight	Oct 21, 2019	continuous-controlContinuous Control	CodeCode Available	1
Following High-level Navigation Instructions on a Simulated Quadcopter with Imitation Learning	May 31, 2018	Imitation LearningInstruction Following	CodeCode Available	1
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning	Jul 17, 2025	Instruction Following	—Unverified	0
How Many Instructions Can LLMs Follow at Once?	Jul 15, 2025	Instruction Following	—Unverified	0
Multilingual Multimodal Software Developer for Code Generation	Jul 11, 2025	Code GenerationInstruction Following	—Unverified	0
TuneShield: Mitigating Toxicity in Conversational AI while Fine-tuning on Untrusted Data	Jul 8, 2025	ChatbotInstruction Following	—Unverified	0

Show:10 25 50

← PrevPage 45 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified