SOTAVerified|Agents Browse Leaderboard About

Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 341–350 of 1135 papers

Title	Date	Tasks	Status	Hype
LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation	Nov 14, 2024	Earth ObservationInstruction Following	CodeCode Available	2
Zero-shot Object-Centric Instruction Following: Integrating Foundation Models with Traditional Navigation	Nov 12, 2024	Instruction FollowingObject	—Unverified	0
LIFBench: Evaluating the Instruction Following Performance and Stability of Large Language Models in Long-Context Scenarios	Nov 11, 2024	Instruction Following	CodeCode Available	0
SetLexSem Challenge: Using Set Operations to Evaluate the Lexical and Semantic Robustness of Language Models	Nov 11, 2024	Instruction Following	CodeCode Available	1
Stronger Models are NOT Stronger Teachers for Instruction Tuning	Nov 11, 2024	Instruction Following	—Unverified	0
MrSteve: Instruction-Following Agents in Minecraft with What-Where-When Memory	Nov 11, 2024	Instruction FollowingMinecraft	—Unverified	0
IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization	Nov 9, 2024	Instruction Following	—Unverified	0
Fox-1 Technical Report	Nov 8, 2024	2k8k	—Unverified	0
Bayesian Calibration of Win Rate Estimation with LLM Evaluators	Nov 7, 2024	Bayesian InferenceInstruction Following	CodeCode Available	0
Multi-Reward as Condition for Instruction-based Image Editing	Nov 6, 2024	DescriptiveInstruction Following	—Unverified	0

Show:10 25 50

← PrevPage 35 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified