SOTAVerified

Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Showing 901950 of 1135 papers

TitleStatusHype
Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations0
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking0
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning0
FLAME: Factuality-Aware Alignment for Large Language Models0
Don't Half-listen: Capturing Key-part Information in Continual Instruction Tuning0
Domain Adaptation of VLM for Soccer Video Understanding0
FlowKV: Enhancing Multi-Turn Conversational Coherence in LLMs via Isolated Key-Value Cache Management0
GuideBench: Benchmarking Domain-Oriented Guideline Following for LLM Agents0
FollowEval: A Multi-Dimensional Benchmark for Assessing the Instruction-Following Capability of Large Language Models0
StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation0
Following Instructions by Imagining and Reaching Visual Goals0
Following Length Constraints in Instructions0
Summarizing a virtual robot's past actions in natural language0
ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization0
Foundational Model for Electron Micrograph Analysis: Instruction-Tuning Small-Scale Language-and-Vision Assistant for Enterprise Adoption0
Fox-1 Technical Report0
From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models0
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data0
From “Before” to “After”: Generating Natural Language Instructions from Image Pairs in a Simple Visual Domain0
Superficial Safety Alignment Hypothesis0
From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following0
From Persona to Personalization: A Survey on Role-Playing Language Agents0
Surgical-LLaVA: Toward Surgical Scenario Understanding via Large Language and Vision Models0
From Role-Play to Drama-Interaction: An LLM Solution0
From Words to Workflows: Automating Business Processes0
Frustrated with Code Quality Issues? LLMs can Help!0
Who Taught You That? Tracing Teachers in Model Distillation0
DNA 1.0 Technical Report0
Diversity Measurement and Subset Selection for Instruction Tuning Datasets0
Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis0
Diverse and Fine-Grained Instruction-Following Ability Exploration with Synthetic Data0
Gaussian Scenes: Pose-Free Sparse-View Scene Reconstruction using Depth-Enhanced Diffusion Priors0
Gemma 3 Technical Report0
DistilQwen2.5: Industrial Practices of Training Distilled Open Lightweight Language Models0
Distilling Internet-Scale Vision-Language Models into Embodied Agents0
Generalization in Instruction Following Systems0
Can Query Expansion Improve Generalization of Strong Cross-Encoder Rankers?0
Generate Subgoal Images before Act: Unlocking the Chain-of-Thought Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts0
Conversational Code Generation: a Case Study of Designing a Dialogue System for Generating Driving Scenarios for Testing Autonomous Vehicles0
SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models0
Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning0
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models0
Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control0
Got Compute, but No Data: Lessons From Post-training a Finnish LLM0
Disentangling Length Bias In Preference Learning Via Response-Conditioned Modeling0
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation0
DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding0
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models0
GROOT-2: Weakly Supervised Multi-Modal Instruction Following Agents0
GROOT: Learning to Follow Instructions by Watching Gameplay Videos0
Show:102550
← PrevPage 19 of 23Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AutoIF (Llama3 70B)Inst-level loose-accuracy90.4Unverified
2AutoIF (Qwen2 72B)Inst-level loose-accuracy88Unverified
3GPT-4Inst-level loose-accuracy85.37Unverified
4PaLM 2 SInst-level loose-accuracy59.11Unverified