Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 81–90 of 1135 papers

Title	Date	Tasks	Status	Hype
How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition	Oct 9, 2023	Code GenerationInstruction Following	CodeCode Available	3
How Can Recommender Systems Benefit from Large Language Models: A Survey	Jun 9, 2023	EthicsFeature Engineering	CodeCode Available	3
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback	May 22, 2023	Instruction Following	CodeCode Available	3
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans	May 8, 2023	Instruction FollowingLanguage Modeling	CodeCode Available	3
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages	May 7, 2023	AttributeInstruction Following	CodeCode Available	3
Panda LLM: Training Data and Evaluation for Open-Sourced Chinese Instruction-Following Large Language Models	May 4, 2023	Instruction Following	CodeCode Available	3
Caption Anything: Interactive Image Description with Diverse Multimodal Controls	May 4, 2023	controllable image captioningImage Captioning	CodeCode Available	3
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks	Apr 16, 2022	BenchmarkingInstruction Following	CodeCode Available	3
DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering	Jul 15, 2025	BenchmarkingInstruction Following	CodeCode Available	2
DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment	Jul 3, 2025	cross-modal alignmentInstruction Following	CodeCode Available	2

Show:10 25 50

← PrevPage 9 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified