Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–160 of 1135 papers

Title	Date	Tasks	Status	Hype
Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning	Mar 31, 2025	General Reinforcement LearningInstruction Following	CodeCode Available	2
Effectively Controlling Reasoning Models through Thinking Intervention	Mar 31, 2025	Instruction FollowingSafety Alignment	—Unverified	0
Pay More Attention to the Robustness of Prompt for Instruction Data Mining	Mar 31, 2025	Instruction Following	—Unverified	0
Learning to Instruct for Visual Instruction Tuning	Mar 28, 2025	HallucinationInstruction Following	—Unverified	0
InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction	Mar 26, 2025	Instruction FollowingVideo Editing	CodeCode Available	1
Qwen2.5-Omni Technical Report	Mar 26, 2025	Automatic Speech Recognition (ASR)GSM8K	CodeCode Available	7
Gemma 3 Technical Report	Mar 25, 2025	Instruction FollowingMath	—Unverified	0
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild	Mar 24, 2025	Instruction FollowingMath	CodeCode Available	7
OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence	Mar 20, 2025	Instruction FollowingNatural Language Understanding	—Unverified	0
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings	Mar 19, 2025	Instruction FollowingLarge Language Model	CodeCode Available	0

Show:10 25 50

← PrevPage 16 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified