Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 301–310 of 1135 papers

Title	Date	Tasks	Status	Hype
LLM-RG4: Flexible and Factual Radiology Report Generation across Diverse Input Contexts	Dec 16, 2024	General KnowledgeInstruction Following	CodeCode Available	2
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models	Dec 16, 2024	Instruction Following	CodeCode Available	1
ChipAlign: Instruction Alignment in Large Language Models for Chip Design via Geodesic Interpolation	Dec 15, 2024	Instruction Following	—Unverified	0
Leveraging Large Vision-Language Model as User Intent-aware Encoder for Composed Image Retrieval	Dec 15, 2024	Image RetrievalInstruction Following	—Unverified	0
Empowering LLMs to Understand and Generate Complex Vector Graphics	Dec 15, 2024	Instruction FollowingVector Graphics	—Unverified	0
VLR-Bench: Multilingual Benchmark Dataset for Vision-Language Retrieval Augmented Generation	Dec 13, 2024	Instruction FollowingQuestion Answering	—Unverified	0
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM	Dec 12, 2024	Image ComprehensionImage Generation	—Unverified	0
LLaVA-Zip: Adaptive Visual Token Compression with Intrinsic Image Information	Dec 11, 2024	Data AugmentationInstruction Following	—Unverified	0
SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs	Dec 11, 2024	ARCGSM8K	—Unverified	0
PediaBench: A Comprehensive Chinese Pediatric Dataset for Benchmarking Large Language Models	Dec 9, 2024	BenchmarkingInstruction Following	CodeCode Available	0

Show:10 25 50

← PrevPage 31 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified