Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 221–230 of 1135 papers

Title	Date	Tasks	Status	Hype	Score
Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-following LLM	Apr 24, 2023	Instruction FollowingLanguage Modelling	CodeCode Available	1	5
Large Language Models as Evaluators for Recommendation Explanations	Jun 5, 2024	Common Sense ReasoningInstruction Following	CodeCode Available	1	5
DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations	Oct 24, 2024	Instruction FollowingQuestion Answering	CodeCode Available	1	5
AlpaGasus: Training A Better Alpaca with Fewer Data	Jul 17, 2023	Instruction Following	CodeCode Available	1	5
DANLI: Deliberative Agent for Following Natural Language Instructions	Oct 22, 2022	Instruction FollowingVision-Language Navigation	CodeCode Available	1	5
AlpaCare:Instruction-tuned Large Language Models for Medical Application	Oct 23, 2023	DiversityInstruction Following	CodeCode Available	1	5
GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing	May 16, 2025	Instruction FollowingMultiple-choice	CodeCode Available	1	5
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits	Oct 2, 2024	Instruction FollowingMath	CodeCode Available	1	5
From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces	May 31, 2023	Instruction Following	CodeCode Available	1	5
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning	Sep 30, 2023	Instruction FollowingLanguage Modeling	CodeCode Available	1	5

Show:10 25 50

← PrevPage 23 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified