SOTAVerified|Agents Browse Leaderboard About

Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 21–30 of 1135 papers

Title	Date	Tasks	Status	Hype
HalLoc: Token-level Localization of Hallucinations for Vision Language Models	Jun 12, 2025	HallucinationImage Captioning	CodeCode Available	0
AC/DC: LLM-based Audio Comprehension via Dialogue Continuation	Jun 12, 2025	AudioCapsAudio captioning	—Unverified	0
Magistral	Jun 12, 2025	Instruction FollowingReinforcement Learning (RL)	—Unverified	0
Conversational Search: From Fundamentals to Frontiers in the LLM Era	Jun 12, 2025	Conversational SearchInstruction Following	—Unverified	0
Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning	Jun 12, 2025	Instruction FollowingMathematical Reasoning	CodeCode Available	0
Alzheimer's Dementia Detection Using Perplexity from Paired Large Language Models	Jun 11, 2025	Data AugmentationDecision Making	—Unverified	0
VerIF: Verification Engineering for Reinforcement Learning in Instruction Following	Jun 11, 2025	Instruction Followingreinforcement-learning	CodeCode Available	2
EIFBENCH: Extremely Complex Instruction Following Benchmark for Large Language Models	Jun 10, 2025	Instruction FollowingNavigate	CodeCode Available	0
RHealthTwin: Towards Responsible and Multimodal Digital Twins for Personalized Well-being	Jun 10, 2025	HallucinationInstruction Following	—Unverified	0
LLaVA-c: Continual Improved Visual Instruction Tuning	Jun 10, 2025	Continual LearningContinual Pretraining	—Unverified	0

Show:10 25 50

← PrevPage 3 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified