Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 191–200 of 1135 papers

Title	Date	Tasks	Status	Hype
CrowdSelect: Synthetic Instruction Data Selection with Multi-LLM Wisdom	Mar 3, 2025	Instruction Following	CodeCode Available	1
In-context Learning vs. Instruction Tuning: The Case of Small and Multilingual Language Models	Mar 3, 2025	In-Context LearningInstruction Following	—Unverified	0
Re-Imagining Multimodal Instruction Tuning: A Representation View	Mar 2, 2025	Instruction FollowingMME	CodeCode Available	0
Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective	Feb 28, 2025	Instruction Following	—Unverified	0
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge	Feb 27, 2025	GSM8KHumanEval	—Unverified	0
DataMan: Data Manager for Pre-training Large Language Models	Feb 26, 2025	In-Context LearningInstruction Following	—Unverified	0
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems	Feb 26, 2025	Instruction Following	CodeCode Available	2
Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models	Feb 26, 2025	Instruction FollowingVision-Language-Action	—Unverified	0
Ground-level Viewpoint Vision-and-Language Navigation in Continuous Environments	Feb 26, 2025	Instruction FollowingVision and Language Navigation	—Unverified	0
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation	Feb 26, 2025	BenchmarkingCode Generation	CodeCode Available	1

Show:10 25 50

← PrevPage 20 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified