Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 471–480 of 1135 papers

Title	Date	Tasks	Status	Hype
LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs	Sep 3, 2024	16kBenchmarking	CodeCode Available	1
Self-Judge: Selective Instruction Following with Alignment Self-Evaluation	Sep 2, 2024	Instruction FollowingSemantic Similarity	CodeCode Available	0
ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems	Sep 2, 2024	BenchmarkingInstruction Following	CodeCode Available	3
Language Models Benefit from Preparation with Elicited Knowledge	Sep 2, 2024	Instruction FollowingPrompt Engineering	—Unverified	0
Does Alignment Tuning Really Break LLMs' Internal Confidence?	Aug 31, 2024	Instruction Following	CodeCode Available	0
M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation	Aug 29, 2024	Instruction FollowingMedical Report Generation	—Unverified	0
Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity	Aug 29, 2024	Code GenerationDiversity	—Unverified	0
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding	Aug 28, 2024	Instruction Followingscientific discovery	CodeCode Available	2
Multi-Modal Instruction-Tuning Small-Scale Language-and-Vision Assistant for Semiconductor Electron Micrograph Analysis	Aug 27, 2024	Instruction FollowingQuestion Answering	—Unverified	0
Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language Instruction Tuning for Semiconductor Electron Micrograph Analysis	Aug 27, 2024	Instruction FollowingLanguage Modeling	—Unverified	0

Show:10 25 50

← PrevPage 48 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified