Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 61–70 of 1135 papers

Title	Date	Tasks	Status	Hype
Meta-Chunking: Learning Text Segmentation and Semantic Completion via Logical Perception	Oct 16, 2024	Binary ClassificationChunking	CodeCode Available	3
ASFT: Aligned Supervised Fine-Tuning through Absolute Likelihood	Sep 14, 2024	Instruction FollowingText Generation	CodeCode Available	3
ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems	Sep 2, 2024	BenchmarkingInstruction Following	CodeCode Available	3
1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data	Aug 7, 2024	16k2k	CodeCode Available	3
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models	Jul 17, 2024	Instruction FollowingVision and Language Navigation	CodeCode Available	3
AudioBench: A Universal Benchmark for Audio Large Language Models	Jun 23, 2024	Audio Scene UnderstandingInstruction Following	CodeCode Available	3
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models	Jun 19, 2024	Instruction Following	CodeCode Available	3
Refusal in Language Models Is Mediated by a Single Direction	Jun 17, 2024	Instruction Following	CodeCode Available	3
MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning	Jun 13, 2024	Instruction FollowingMath	CodeCode Available	3
FlashFace: Human Image Personalization with High-fidelity Identity Preservation	Mar 25, 2024	Face SwappingImage Generation	CodeCode Available	3

Show:10 25 50

← PrevPage 7 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified