Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 551–560 of 1135 papers

Title	Date	Tasks	Status	Score
MIMO: A Medical Vision Language Model with Visual Referring Multimodal Input and Pixel Grounding Multimodal Output	Jan 1, 2025	Instruction FollowingLanguage Modeling	CodeCode Available	5
Chasing Ghosts: Instruction Following as Bayesian State Tracking	Jul 3, 2019	Instruction FollowingVision and Language Navigation	CodeCode Available	5
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators	Apr 21, 2025	Code GenerationInstruction Following	CodeCode Available	5
AbsInstruct: Eliciting Abstraction Ability from LLMs through Explanation Tuning with Plausibility Estimation	Feb 16, 2024	Instruction Following	CodeCode Available	5
MDCure: A Scalable Pipeline for Multi-Document Instruction-Following	Oct 30, 2024	ArticlesInstruction Following	CodeCode Available	5
CASTILLO: Characterizing Response Length Distributions of Large Language Models	May 22, 2025	Instruction FollowingLanguage Modeling	CodeCode Available	5
Empowering Source-Free Domain Adaptation with MLLM-driven Curriculum Learning	May 28, 2024	Domain AdaptationInstruction Following	CodeCode Available	5
Mapping Instructions to Actions in 3D Environments with Visual Goal Prediction	Sep 4, 2018	Action GenerationConditional Image Generation	CodeCode Available	5
Empowering Cross-lingual Abilities of Instruction-tuned Large Language Models by Translation-following demonstrations	Aug 27, 2023	Instruction FollowingMMLU	CodeCode Available	5
Mapping Navigation Instructions to Continuous Control Actions with Position-Visitation Prediction	Nov 10, 2018	continuous-controlContinuous Control	CodeCode Available	5

Show:10 25 50

← PrevPage 56 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified