Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 311–320 of 1135 papers

Title	Date	Tasks	Status	Hype
LLMs for Generalizable Language-Conditioned Policy Learning under Minimal Data Requirements	Dec 9, 2024	Decision MakingInstruction Following	—Unverified	0
Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families	Dec 9, 2024	Emotional IntelligenceInstruction Following	CodeCode Available	0
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models	Dec 8, 2024	Instruction FollowingNatural Language Understanding	CodeCode Available	1
GROOT-2: Weakly Supervised Multi-Modal Instruction Following Agents	Dec 7, 2024	Instruction Following	—Unverified	0
Compositional Image Retrieval via Instruction-Aware Contrastive Learning	Dec 7, 2024	Contrastive LearningImage Retrieval	CodeCode Available	0
RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts	Dec 7, 2024	Change DetectionImage Comprehension	CodeCode Available	1
EXAONE 3.5: Series of Large Language Models for Real-world Use Cases	Dec 6, 2024	Instruction Following	—Unverified	0
LLM-Align: Utilizing Large Language Models for Entity Alignment in Knowledge Graphs	Dec 6, 2024	Entity AlignmentEntity Embeddings	—Unverified	0
If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs	Dec 5, 2024	Code GenerationInstruction Following	—Unverified	0
VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding	Dec 4, 2024	HallucinationInstruction Following	—Unverified	0

Show:10 25 50

← PrevPage 32 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified