Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 591–600 of 1135 papers

Title	Date	Tasks	Status
URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models	Feb 25, 2025	Instruction Following	—Unverified
TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model Reasoning	Feb 25, 2025	Instruction FollowingLanguage Modeling	CodeCode Available
UrduLLaMA 1.0: Dataset Curation, Preprocessing, and Evaluation in Low-Resource Settings	Feb 24, 2025	DiversityInstruction Following	—Unverified
Capability Instruction Tuning: A New Paradigm for Dynamic LLM Routing	Feb 24, 2025	Instruction FollowingModel Selection	CodeCode Available
ATEB: Evaluating and Improving Advanced NLP Tasks for Text Embedding Models	Feb 24, 2025	Information RetrievalInstruction Following	—Unverified
Order Matters: Investigate the Position Bias in Multi-constraint Instruction Following	Feb 24, 2025	Instruction FollowingPosition	CodeCode Available
NatSGLD: A Dataset with Speech, Gesture, Logic, and Demonstration for Robot Learning in Natural Human-Robot Interaction	Feb 23, 2025	Instruction Following	CodeCode Available
Sequence-level Large Language Model Training with Contrastive Preference Optimization	Feb 23, 2025	Instruction FollowingLanguage Modeling	—Unverified
SOTOPIA-Ω: Dynamic Strategy Injection Learning and Social Instruction Following Evaluation for Social Agents	Feb 21, 2025	Instruction Following	CodeCode Available
OpenSearch-SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Alignment	Feb 19, 2025	HallucinationInstruction Following	—Unverified

Show:10 25 50

← PrevPage 60 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified