Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 901–910 of 1135 papers

Title	Date	Tasks	Status
Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations	Apr 8, 2025	Instruction FollowingMixture-of-Experts	—Unverified
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking	Feb 22, 2024	Code GenerationInstruction Following	—Unverified
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning	May 16, 2024	Decision MakingInstruction Following	—Unverified
FLAME: Factuality-Aware Alignment for Large Language Models	May 2, 2024	HallucinationInstruction Following	—Unverified
Don't Half-listen: Capturing Key-part Information in Continual Instruction Tuning	Mar 15, 2024	Instruction Following	—Unverified
Domain Adaptation of VLM for Soccer Video Understanding	May 20, 2025	Action ClassificationDomain Adaptation	—Unverified
FlowKV: Enhancing Multi-Turn Conversational Coherence in LLMs via Isolated Key-Value Cache Management	May 21, 2025	Instruction FollowingManagement	—Unverified
GuideBench: Benchmarking Domain-Oriented Guideline Following for LLM Agents	May 16, 2025	BenchmarkingInstruction Following	—Unverified
FollowEval: A Multi-Dimensional Benchmark for Assessing the Instruction-Following Capability of Large Language Models	Nov 16, 2023	Instruction FollowingLogical Reasoning	—Unverified
StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation	May 26, 2025	Image GenerationInstruction Following	—Unverified

Show:10 25 50

← PrevPage 91 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified