Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 971–980 of 1135 papers

Title	Date	Tasks	Status	Hype
Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models	Jul 3, 2023	FormInstruction Following	CodeCode Available	1
Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control	Jun 30, 2023	Instruction Following	—Unverified	0
KITE: Keypoint-Conditioned Policies for Semantic Manipulation	Jun 29, 2023	Instruction FollowingObject	—Unverified	0
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding	Jun 29, 2023	16kImage Captioning	CodeCode Available	2
On the Exploitability of Instruction Tuning	Jun 28, 2023	Data PoisoningInstruction Following	CodeCode Available	1
OphGLM: Training an Ophthalmology Large Language-and-Vision Assistant based on Instructions and Dialogue	Jun 21, 2023	Instruction FollowingLanguage Modeling	CodeCode Available	1
BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models	Jun 19, 2023	Instruction FollowingText Generation	CodeCode Available	2
CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation	Jun 17, 2023	Decision MakingInstruction Following	—Unverified	0
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models	Jun 15, 2023	HallucinationImage Captioning	CodeCode Available	2
MiniLLM: Knowledge Distillation of Large Language Models	Jun 14, 2023	Instruction FollowingKnowledge Distillation	CodeCode Available	2

Show:10 25 50

← PrevPage 98 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified