Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–210 of 1135 papers

Title	Date	Tasks	Status	Hype
Ground-level Viewpoint Vision-and-Language Navigation in Continuous Environments	Feb 26, 2025	Instruction FollowingVision and Language Navigation	—Unverified	0
TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model Reasoning	Feb 25, 2025	Instruction FollowingLanguage Modeling	CodeCode Available	0
Rank1: Test-Time Compute for Reranking in Information Retrieval	Feb 25, 2025	Information RetrievalInstruction Following	CodeCode Available	2
URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models	Feb 25, 2025	Instruction Following	—Unverified	0
UrduLLaMA 1.0: Dataset Curation, Preprocessing, and Evaluation in Low-Resource Settings	Feb 24, 2025	DiversityInstruction Following	—Unverified	0
ATEB: Evaluating and Improving Advanced NLP Tasks for Text Embedding Models	Feb 24, 2025	Information RetrievalInstruction Following	—Unverified	0
Order Matters: Investigate the Position Bias in Multi-constraint Instruction Following	Feb 24, 2025	Instruction FollowingPosition	CodeCode Available	0
Capability Instruction Tuning: A New Paradigm for Dynamic LLM Routing	Feb 24, 2025	Instruction FollowingModel Selection	CodeCode Available	0
Sequence-level Large Language Model Training with Contrastive Preference Optimization	Feb 23, 2025	Instruction FollowingLanguage Modeling	—Unverified	0
NatSGLD: A Dataset with Speech, Gesture, Logic, and Demonstration for Robot Learning in Natural Human-Robot Interaction	Feb 23, 2025	Instruction Following	CodeCode Available	0

Show:10 25 50

← PrevPage 21 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified