Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 551–600 of 1135 papers

Title	Date	Tasks	Status
Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations	Apr 8, 2025	Instruction FollowingMixture-of-Experts	—Unverified
Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators	Apr 8, 2025	Instruction Following	—Unverified
From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models	Apr 8, 2025	In-Context LearningInstruction Following	—Unverified
The Hidden Space of Safety: Understanding Preference-Tuned LLMs in Multilingual context	Apr 3, 2025	Instruction Following	—Unverified
Effectively Controlling Reasoning Models through Thinking Intervention	Mar 31, 2025	Instruction FollowingSafety Alignment	—Unverified
Pay More Attention to the Robustness of Prompt for Instruction Data Mining	Mar 31, 2025	Instruction Following	—Unverified
Learning to Instruct for Visual Instruction Tuning	Mar 28, 2025	HallucinationInstruction Following	—Unverified
Gemma 3 Technical Report	Mar 25, 2025	Instruction FollowingMath	—Unverified
OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence	Mar 20, 2025	Instruction FollowingNatural Language Understanding	—Unverified
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings	Mar 19, 2025	Instruction FollowingLarge Language Model	CodeCode Available
ThinkPatterns-21k: A Systematic Study on the Impact of Thinking Patterns in LLMs	Mar 17, 2025	Instruction Following	—Unverified
ICCO: Learning an Instruction-conditioned Coordinator for Language-guided Task-aligned Multi-robot Control	Mar 15, 2025	Instruction FollowingMulti-agent Reinforcement Learning	—Unverified
ASMA-Tune: Unlocking LLMs' Assembly Code Comprehension via Structural-Semantic Instruction Tuning	Mar 14, 2025	Code GenerationDecoder	CodeCode Available
D3: Diversity, Difficulty, and Dependability-Aware Data Selection for Sample-Efficient LLM Instruction Tuning	Mar 14, 2025	DiversityInstruction Following	—Unverified
Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models	Mar 13, 2025	Instruction Following	—Unverified
Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding	Mar 12, 2025	Instruction FollowingVideo Understanding	—Unverified
Got Compute, but No Data: Lessons From Post-training a Finnish LLM	Mar 12, 2025	Instruction Following	—Unverified
Open-World Skill Discovery from Unsegmented Demonstrations	Mar 11, 2025	Boundary DetectionEvent Segmentation	—Unverified
DAFE: LLM-Based Evaluation Through Dynamic Arbitration for Free-Form Question-Answering	Mar 11, 2025	FormInstruction Following	—Unverified
Robust Multi-Objective Controlled Decoding of Large Language Models	Mar 11, 2025	Instruction Following	CodeCode Available
XIFBench: Evaluating Large Language Models on Multilingual Instruction Following	Mar 10, 2025	Instruction FollowingSpecificity	—Unverified
Dr Genre: Reinforcement Learning from Decoupled LLM Feedback for Generic Text Rewriting	Mar 9, 2025	Instruction FollowingLarge Language Model	—Unverified
WildIFEval: Instruction Following in the Wild	Mar 9, 2025	Instruction Following	CodeCode Available
S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information	Mar 7, 2025	Instruction Following	—Unverified
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval	Mar 6, 2025	Information RetrievalInstruction Following	—Unverified
Implicit Cross-Lingual Rewarding for Efficient Multilingual Preference Alignment	Mar 6, 2025	Instruction FollowingTransfer Learning	CodeCode Available
Unified Mind Model: Reimagining Autonomous Agents in the LLM Era	Mar 5, 2025	Instruction Following	—Unverified
CodeIF-Bench: Evaluating Instruction-Following Capabilities of Large Language Models in Interactive Code Generation	Mar 5, 2025	Code GenerationInstruction Following	—Unverified
Robust Learning of Diverse Code Edits	Mar 5, 2025	Code GenerationInstruction Following	—Unverified
LEWIS (LayEr WIse Sparsity) -- A Training Free Guided Model Merging Approach	Mar 5, 2025	Instruction FollowingMath	—Unverified
InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training	Mar 4, 2025	Instruction Followingtext-to-speech	—Unverified
Iterative Value Function Optimization for Guided Decoding	Mar 4, 2025	Decision MakingInstruction Following	—Unverified
In-context Learning vs. Instruction Tuning: The Case of Small and Multilingual Language Models	Mar 3, 2025	In-Context LearningInstruction Following	—Unverified
Re-Imagining Multimodal Instruction Tuning: A Representation View	Mar 2, 2025	Instruction FollowingMME	CodeCode Available
Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective	Feb 28, 2025	Instruction Following	—Unverified
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge	Feb 27, 2025	GSM8KHumanEval	—Unverified
DataMan: Data Manager for Pre-training Large Language Models	Feb 26, 2025	In-Context LearningInstruction Following	—Unverified
Stay Focused: Problem Drift in Multi-Agent Debate	Feb 26, 2025	Instruction Following	CodeCode Available
Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models	Feb 26, 2025	Instruction FollowingVision-Language-Action	—Unverified
Ground-level Viewpoint Vision-and-Language Navigation in Continuous Environments	Feb 26, 2025	Instruction FollowingVision and Language Navigation	—Unverified
URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models	Feb 25, 2025	Instruction Following	—Unverified
TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model Reasoning	Feb 25, 2025	Instruction FollowingLanguage Modeling	CodeCode Available
UrduLLaMA 1.0: Dataset Curation, Preprocessing, and Evaluation in Low-Resource Settings	Feb 24, 2025	DiversityInstruction Following	—Unverified
Capability Instruction Tuning: A New Paradigm for Dynamic LLM Routing	Feb 24, 2025	Instruction FollowingModel Selection	CodeCode Available
ATEB: Evaluating and Improving Advanced NLP Tasks for Text Embedding Models	Feb 24, 2025	Information RetrievalInstruction Following	—Unverified
Order Matters: Investigate the Position Bias in Multi-constraint Instruction Following	Feb 24, 2025	Instruction FollowingPosition	CodeCode Available
NatSGLD: A Dataset with Speech, Gesture, Logic, and Demonstration for Robot Learning in Natural Human-Robot Interaction	Feb 23, 2025	Instruction Following	CodeCode Available
Sequence-level Large Language Model Training with Contrastive Preference Optimization	Feb 23, 2025	Instruction FollowingLanguage Modeling	—Unverified
SOTOPIA-Ω: Dynamic Strategy Injection Learning and Social Instruction Following Evaluation for Social Agents	Feb 21, 2025	Instruction Following	CodeCode Available
OpenSearch-SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Alignment	Feb 19, 2025	HallucinationInstruction Following	—Unverified

Show:10 25 50

← PrevPage 12 of 23Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified