SOTAVerified|Agents Browse Leaderboard About

Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 651–660 of 1135 papers

Title	Date	Tasks	Status
Modular Networks for Compositional Instruction Following	Oct 24, 2020	Instruction Following	—Unverified
Can Large Language Models Understand Symbolic Graphics Programs?	Aug 15, 2024	Instruction FollowingProgram Synthesis	—Unverified
Traffic Sign Interpretation in Real Road Scene	Nov 17, 2023	Instruction FollowingMulti-Task Learning	—Unverified
CamelEval: Advancing Culturally Aligned Arabic Language Models and Benchmarks	Sep 19, 2024	Instruction FollowingOpen-Ended Question Answering	—Unverified
Training an LLM-as-a-Judge Model: Pipeline, Insights, and Practical Lessons	Feb 5, 2025	Instruction FollowingKnowledge Distillation	—Unverified
CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation	Jun 17, 2023	Decision MakingInstruction Following	—Unverified
MrSteve: Instruction-Following Agents in Minecraft with What-Where-When Memory	Nov 11, 2024	Instruction FollowingMinecraft	—Unverified
MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following	Dec 5, 2023	Instruction Following	—Unverified
CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks	Apr 29, 2025	Instruction Following	—Unverified
Multi-Level Aware Preference Learning: Enhancing RLHF for Complex Multi-Instruction Tasks	May 19, 2025	Instruction Following	—Unverified

Show:10 25 50

← PrevPage 66 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified