Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 581–590 of 1135 papers

Title	Date	Tasks	Status	Score
Synthetic Programming Elicitation for Text-to-Code in Very Low-Resource Programming and Formal Languages	Jun 5, 2024	Instruction FollowingRetrieval	CodeCode Available	5
Guiding Policies with Language via Meta-Learning	Nov 19, 2018	Imitation LearningInstruction Following	CodeCode Available	5
Token-Efficient Leverage Learning in Large Language Models	Apr 1, 2024	Instruction FollowingTranslation	CodeCode Available	5
HalLoc: Token-level Localization of Hallucinations for Vision Language Models	Jun 12, 2025	HallucinationImage Captioning	CodeCode Available	5
LIFBench: Evaluating the Instruction Following Performance and Stability of Large Language Models in Long-Context Scenarios	Nov 11, 2024	Instruction Following	CodeCode Available	5
LIFEBench: Evaluating Length Instruction Following in Large Language Models	May 22, 2025	Instruction FollowingText Generation	CodeCode Available	5
Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families	Dec 9, 2024	Emotional IntelligenceInstruction Following	CodeCode Available	5
IndiVec: An Exploration of Leveraging Large Language Models for Media Bias Detection with Fine-Grained Bias Indicators	Feb 1, 2024	Bias DetectionInstruction Following	CodeCode Available	5
Automated curriculum generation for Policy Gradients from Demonstrations	Dec 1, 2019	Instruction Following	CodeCode Available	5
Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision	Jan 14, 2025	Instruction FollowingMath	CodeCode Available	5

Show:10 25 50

← PrevPage 59 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified