Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 761–770 of 1135 papers

Title	Date	Tasks	Status
Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags	Jun 16, 2024	Image to textInstruction Following	—Unverified
Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models	Jan 30, 2025	Instruction FollowingVisual Reasoning	—Unverified
Rethinking Predictive Modeling for LLM Routing: When Simple kNN Beats Complex Learned Routers	May 19, 2025	Instruction FollowingQuestion Answering	—Unverified
Retrieval Augmented Chest X-Ray Report Generation using OpenAI GPT models	May 5, 2023	Instruction FollowingLanguage Modeling	—Unverified
URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models	Feb 25, 2025	Instruction Following	—Unverified
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References	Oct 7, 2024	Instruction FollowingText Generation	—Unverified
Revisiting the Superficial Alignment Hypothesis	Sep 27, 2024	Instruction FollowingMath	—Unverified
Shuttle Between the Instructions and the Parameters of Large Language Models	Feb 4, 2025	Dimensionality ReductionInstruction Following	—Unverified
A Systematic Examination of Preference Learning through the Lens of Instruction-Following	Dec 18, 2024	Instruction FollowingSynthetic Data Generation	—Unverified
A Survey of Reinforcement Learning Informed by Natural Language	Jun 10, 2019	Decision MakingInstruction Following	—Unverified

Show:10 25 50

← PrevPage 77 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified