Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 891–900 of 1135 papers

Title	Date	Tasks	Status
LLM-AD: Large Language Model based Audio Description System	May 2, 2024	Instruction FollowingLanguage Modeling	—Unverified
FLAME: Factuality-Aware Alignment for Large Language Models	May 2, 2024	HallucinationInstruction Following	—Unverified
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation	Apr 30, 2024	Caption GenerationHallucination	—Unverified
HELPER-X: A Unified Instructable Embodied Agent to Tackle Four Interactive Vision-Language Domains with Memory-Augmented Language Models	Apr 29, 2024	Instruction Following	—Unverified
From Persona to Personalization: A Survey on Role-Playing Language Agents	Apr 28, 2024	In-Context LearningInstruction Following	—Unverified
URL: Universal Referential Knowledge Linking via Task-instructed Representation Compression	Apr 24, 2024	Information RetrievalInstruction Following	CodeCode Available
Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models	Apr 23, 2024	Instruction Following	—Unverified
Socratic Planner: Self-QA-Based Zero-Shot Planning for Embodied Instruction Following	Apr 21, 2024	In-Context LearningInstruction Following	—Unverified
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning	Apr 19, 2024	Benchmarkingcounterfactual	—Unverified
Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V	Apr 16, 2024	Instruction FollowingMultimodal Reasoning	—Unverified

Show:10 25 50

← PrevPage 90 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified