Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 971–980 of 1135 papers

Title	Date	Tasks	Status
Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation	May 27, 2024	Instruction FollowingLanguage Modeling	—Unverified
Self-driven Grounding: Large Language Model Agents with Automatical Language-aligned Skill Learning	Sep 4, 2023	Imitation LearningInstruction Following	—Unverified
Self-Educated Language Agent with Hindsight Experience Replay for Instruction Following	Sep 25, 2019	Instruction FollowingLanguage Acquisition	—Unverified
HIGhER : Improving instruction following with Hindsight Generation for Experience Replay	Oct 21, 2019	Instruction FollowingLanguage Acquisition	—Unverified
Identifying Reliable Evaluation Metrics for Scientific Text Revision	Jun 5, 2025	Instruction Following	CodeCode Available
Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning	Mar 21, 2024	In-Context LearningInstruction Following	CodeCode Available
ASMA-Tune: Unlocking LLMs' Assembly Code Comprehension via Structural-Semantic Instruction Tuning	Mar 14, 2025	Code GenerationDecoder	CodeCode Available
IFShip: Interpretable Fine-grained Ship Classification with Domain Knowledge-Enhanced Vision-Language Models	Aug 13, 2024	ChatbotInstruction Following	CodeCode Available
CommonIT: Commonality-Aware Instruction Tuning for Large Language Models via Data Partitions	Oct 4, 2024	Instruction FollowingMMLU	CodeCode Available
PACIT: Unlocking the Power of Examples for Better In-Context Instruction Tuning	Oct 2, 2023	Instruction FollowingZero-shot Generalization	CodeCode Available

Show:10 25 50

← PrevPage 98 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified