Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 571–580 of 1135 papers

Title	Date	Tasks	Status
XIFBench: Evaluating Large Language Models on Multilingual Instruction Following	Mar 10, 2025	Instruction FollowingSpecificity	—Unverified
Dr Genre: Reinforcement Learning from Decoupled LLM Feedback for Generic Text Rewriting	Mar 9, 2025	Instruction FollowingLarge Language Model	—Unverified
WildIFEval: Instruction Following in the Wild	Mar 9, 2025	Instruction Following	CodeCode Available
S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information	Mar 7, 2025	Instruction Following	—Unverified
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval	Mar 6, 2025	Information RetrievalInstruction Following	—Unverified
Implicit Cross-Lingual Rewarding for Efficient Multilingual Preference Alignment	Mar 6, 2025	Instruction FollowingTransfer Learning	CodeCode Available
Unified Mind Model: Reimagining Autonomous Agents in the LLM Era	Mar 5, 2025	Instruction Following	—Unverified
CodeIF-Bench: Evaluating Instruction-Following Capabilities of Large Language Models in Interactive Code Generation	Mar 5, 2025	Code GenerationInstruction Following	—Unverified
Robust Learning of Diverse Code Edits	Mar 5, 2025	Code GenerationInstruction Following	—Unverified
LEWIS (LayEr WIse Sparsity) -- A Training Free Guided Model Merging Approach	Mar 5, 2025	Instruction FollowingMath	—Unverified

Show:10 25 50

← PrevPage 58 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified