SOTAVerified|Agents Browse Leaderboard About

Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1061–1070 of 1135 papers

Title	Date	Tasks	Status	Hype
Language Models are General-Purpose Interfaces	Jun 13, 2022	Causal Language ModelingFew-Shot Learning	—Unverified	0
GoalNet: Inferring Conjunctive Goal Predicates from Human Plan Demonstrations for Robot Instruction Following	May 14, 2022	Decision MakingInstruction Following	CodeCode Available	0
Engineering flexible machine learning systems by traversing functionally-invariant paths	Apr 30, 2022	Adversarial RobustnessContinual Learning	CodeCode Available	1
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks	Apr 16, 2022	BenchmarkingInstruction Following	CodeCode Available	3
Inferring Rewards from Language in Context	Apr 5, 2022	Instruction FollowingReinforcement Learning (RL)	CodeCode Available	1
Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation	Mar 30, 2022	counterfactualData Augmentation	CodeCode Available	1
Summarizing a virtual robot's past actions in natural language	Mar 13, 2022	Instruction Following	—Unverified	0
Combining Modular Skills in Multitask Learning	Feb 28, 2022	Instruction Followingreinforcement-learning	CodeCode Available	1
DialFRED: Dialogue-Enabled Agents for Embodied Instruction Following	Feb 27, 2022	Instruction FollowingNavigate	CodeCode Available	1
Compositionality as Lexical Symmetry	Jan 30, 2022	Data AugmentationInductive Bias	CodeCode Available	0

Show:10 25 50

← PrevPage 107 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified