Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 121–130 of 1135 papers

Title	Date	Tasks	Status	Hype
UAV-VLN: End-to-End Vision Language guided Navigation for UAVs	Apr 30, 2025	Common Sense ReasoningInstruction Following	—Unverified	0
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models	Apr 29, 2025	BenchmarkingDataset Generation	CodeCode Available	0
CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks	Apr 29, 2025	Instruction Following	—Unverified	0
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs	Apr 24, 2025	Image-text RetrievalInstruction Following	—Unverified	0
ParamΔ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost	Apr 23, 2025	Instruction FollowingLanguage Modeling	—Unverified	0
ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance	Apr 23, 2025	Instruction FollowingSSIM	—Unverified	0
Case Study: Fine-tuning Small Language Models for Accurate and Private CWE Detection in Python Code	Apr 23, 2025	Instruction FollowingPrivacy Preserving	—Unverified	0
Instruction-Tuning Data Synthesis from Scratch via Web Reconstruction	Apr 22, 2025	DiversityDomain Adaptation	CodeCode Available	1
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators	Apr 21, 2025	Code GenerationInstruction Following	CodeCode Available	0
DistilQwen2.5: Industrial Practices of Training Distilled Open Lightweight Language Models	Apr 21, 2025	Computational EfficiencyInstruction Following	—Unverified	0

Show:10 25 50

← PrevPage 13 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified