Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 771–780 of 1135 papers

Title	Date	Tasks	Status	Hype
Mitigating the Influence of Distractor Tasks in LMs with Prior-Aware Decoding	Jan 31, 2024	Instruction Following	—Unverified	0
LongAlign: A Recipe for Long Context Alignment of Large Language Models	Jan 31, 2024	DiversityInstruction Following	CodeCode Available	3
Taking Action Towards Graceful Interaction: The Effects of Performing Actions on Modelling Policies for Instruction Clarification Requests	Jan 30, 2024	Instruction Following	CodeCode Available	0
EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain	Jan 30, 2024	Image ComprehensionInstruction Following	CodeCode Available	2
KAUCUS: Knowledge Augmented User Simulators for Training Language Model Assistants	Jan 29, 2024	DiversityInstruction Following	—Unverified	0
SelectLLM: Can LLMs Select Important Instructions to Annotate?	Jan 29, 2024	Active LearningInstruction Following	CodeCode Available	1
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty	Jan 26, 2024	Code GenerationInstruction Following	CodeCode Available	7
F-Eval: Assessing Fundamental Abilities with Refined Evaluation Methods	Jan 26, 2024	Instruction Following	CodeCode Available	1
Towards 3D Molecule-Text Interpretation in Language Models	Jan 25, 2024	Instruction FollowingLanguage Modeling	CodeCode Available	2
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment	Jan 23, 2024	AllInstruction Following	CodeCode Available	3

Show:10 25 50

← PrevPage 78 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified