SOTAVerified|Agents Browse Leaderboard About

Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 421–430 of 1135 papers

Title	Date	Tasks	Status
Only-IF:Revealing the Decisive Effect of Instruction Diversity on Generalization	Oct 7, 2024	DiversityInstruction Following	—Unverified
CS4: Measuring the Creativity of Large Language Models Automatically by Controlling the Number of Story-Writing Constraints	Oct 5, 2024	Instruction FollowingSpecificity	CodeCode Available
Self-Powered LLM Modality Expansion for Large Speech-Text Models	Oct 4, 2024	Automatic Speech RecognitionInstruction Following	CodeCode Available
SAG: Style-Aligned Article Generation via Model Collaboration	Oct 4, 2024	HallucinationInstruction Following	—Unverified
CommonIT: Commonality-Aware Instruction Tuning for Large Language Models via Data Partitions	Oct 4, 2024	Instruction FollowingMMLU	CodeCode Available
TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation	Oct 4, 2024	AllInstruction Following	—Unverified
Better Instruction-Following Through Minimum Bayes Risk	Oct 3, 2024	Instruction Following	—Unverified
Video Instruction Tuning With Synthetic Data	Oct 3, 2024	3D Question Answering (3D-QA)	—Unverified
LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model	Oct 3, 2024	image-classificationImage Classification	—Unverified
LLaVA-Critic: Learning to Evaluate Multimodal Models	Oct 3, 2024	Instruction Following	—Unverified

Show:10 25 50

← PrevPage 43 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified