SOTAVerified|Agents Browse Leaderboard About

Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 721–730 of 1135 papers

Title	Date	Tasks	Status	Hype
Long-Context Language Modeling with Parallel Context Encoding	Feb 26, 2024	In-Context LearningInstruction Following	CodeCode Available	2
Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding	Feb 26, 2024	DecoderInstruction Following	—Unverified	0
Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing	Feb 25, 2024	Instruction Following	CodeCode Available	1
GraphWiz: An Instruction-Following Language Model for Graph Problems	Feb 25, 2024	Instruction FollowingLanguage Modeling	CodeCode Available	2
NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation	Feb 24, 2024	Decision MakingInstruction Following	—Unverified	0
On the Multi-turn Instruction Following for Conversational Web Agents	Feb 23, 2024	Conversational Web NavigationInstruction Following	CodeCode Available	1
Unintended Impacts of LLM Alignment on Global Representation	Feb 22, 2024	Instruction Following	CodeCode Available	0
INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval Models	Feb 22, 2024	Information RetrievalInstruction Following	CodeCode Available	1
Towards Robust Instruction Tuning on Multimodal Large Language Models	Feb 22, 2024	Instruction Following	CodeCode Available	0
Zero-shot cross-lingual transfer in instruction tuning of large language models	Feb 22, 2024	Cross-Lingual TransferInstruction Following	—Unverified	0

Show:10 25 50

← PrevPage 73 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified