SOTAVerified|Agents Browse Leaderboard About

Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 921–930 of 1135 papers

Title	Date	Tasks	Status
Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning	Mar 21, 2024	In-Context LearningInstruction Following	CodeCode Available
VisualCritic: Making LMMs Perceive Visual Quality Like Humans	Mar 19, 2024	Instruction Following	—Unverified
WoLF: Wide-scope Large Language Model Framework for CXR Understanding	Mar 19, 2024	AnatomyInstruction Following	—Unverified
Third-Party Language Model Performance Prediction from Instruction	Mar 19, 2024	Instruction FollowingLanguage Modeling	CodeCode Available
Don't Half-listen: Capturing Key-part Information in Continual Instruction Tuning	Mar 15, 2024	Instruction Following	—Unverified
Mitigating Dialogue Hallucination for Large Vision Language Models via Adversarial Instruction Tuning	Mar 15, 2024	HallucinationInstruction Following	—Unverified
DiffChat: Learning to Chat with Text-to-Image Synthesis Models for Interactive Image Creation	Mar 8, 2024	Image GenerationInstruction Following	—Unverified
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning	Mar 7, 2024	Instruction Following	—Unverified
Aligners: Decoupling LLMs and Alignment	Mar 7, 2024	Instruction FollowingRed Teaming	CodeCode Available
KIWI: A Dataset of Knowledge-Intensive Writing Instructions for Answering Research Questions	Mar 6, 2024	Instruction Following	—Unverified

Show:10 25 50

← PrevPage 93 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified