Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 141–150 of 1135 papers

Title	Date	Tasks	Status	Hype
Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models	Apr 9, 2025	Instruction FollowingMathematical Problem-Solving	—Unverified	0
Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning	Apr 9, 2025	Continual LearningDecoder	CodeCode Available	1
From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models	Apr 8, 2025	In-Context LearningInstruction Following	—Unverified	0
Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations	Apr 8, 2025	Instruction FollowingMixture-of-Experts	—Unverified	0
Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators	Apr 8, 2025	Instruction Following	—Unverified	0
Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models	Apr 7, 2025	Dialogue EvaluationFairness	CodeCode Available	2
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning	Apr 3, 2025	Image GenerationInstruction Following	CodeCode Available	3
CrystalFormer-RL: Reinforcement Fine-Tuning for Materials Design	Apr 3, 2025	Band GapDielectric Constant	CodeCode Available	2
STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection	Apr 3, 2025	Instruction FollowingLanguage Modeling	CodeCode Available	1
The Hidden Space of Safety: Understanding Preference-Tuned LLMs in Multilingual context	Apr 3, 2025	Instruction Following	—Unverified	0

Show:10 25 50

← PrevPage 15 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified