SOTAVerified|Agents Browse Leaderboard About

Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1101–1110 of 1135 papers

Title	Date	Tasks	Status	Hype
AllenAct: A Framework for Embodied AI Research	Aug 28, 2020	Deep Reinforcement LearningEmbodied Question Answering	CodeCode Available	1
Inverse Reinforcement Learning with Natural Language Goals	Aug 16, 2020	FrictionInstruction Following	—Unverified	0
Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation	Jul 11, 2020	Decision MakingImitation Learning	—Unverified	0
Language-Conditioned Goal Generation: a New Approach to Language Grounding in RL	Jun 12, 2020	DiversityInstruction Following	—Unverified	0
Language-Conditioned Goal Generation: a New Approach to Language Grounding for RL	Jun 12, 2020	DiversityInstruction Following	—Unverified	0
Human Instruction-Following with Deep Reinforcement Learning via Transfer-Learning from Text	May 19, 2020	Deep Reinforcement LearningInstruction Following	—Unverified	0
Language Conditioned Imitation Learning over Unstructured Data	May 15, 2020	continuous-controlContinuous Control	—Unverified	0
RMM: A Recursive Mental Model for Dialog Navigation	May 2, 2020	Answer GenerationInstruction Following	CodeCode Available	1
Zero-Shot Compositional Policy Learning via Language Grounding	Apr 15, 2020	DescriptiveDomain Adaptation	CodeCode Available	1
Following Instructions by Imagining and Reaching Visual Goals	Jan 25, 2020	Instruction FollowingReinforcement Learning	—Unverified	0

Show:10 25 50

← PrevPage 111 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified