SOTAVerified

Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Showing 701750 of 1135 papers

TitleStatusHype
Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal ModelsCode1
Online Continual Learning For Interactive Instruction Following AgentsCode1
DiffChat: Learning to Chat with Text-to-Image Synthesis Models for Interactive Image Creation0
Aligners: Decoupling LLMs and AlignmentCode0
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning0
Learning to Decode Collaboratively with Multiple Language ModelsCode2
IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code GeneratorsCode1
KIWI: A Dataset of Knowledge-Intensive Writing Instructions for Answering Research Questions0
X-Shot: A Unified System to Handle Frequent, Few-shot and Zero-shot Learning Simultaneously in ClassificationCode0
CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following0
OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied Instruction Following0
NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language ModelsCode1
AutoDefense: Multi-Agent LLM Defense against Jailbreak AttacksCode2
LAB: Large-Scale Alignment for ChatBotsCode5
Collaborative decoding of critical tokens for boosting factuality of large language models0
MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-TuningCode1
Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse PlanningCode1
Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation SystemsCode1
SongComposer: A Large Language Model for Lyric and Melody Generation in Song CompositionCode3
ShapeLLM: Universal 3D Object Understanding for Embodied InteractionCode3
Long-Context Language Modeling with Parallel Context EncodingCode2
Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding0
Defending Large Language Models against Jailbreak Attacks via Semantic SmoothingCode1
GraphWiz: An Instruction-Following Language Model for Graph ProblemsCode2
NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation0
On the Multi-turn Instruction Following for Conversational Web AgentsCode1
Unintended Impacts of LLM Alignment on Global RepresentationCode0
INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval ModelsCode1
Towards Robust Instruction Tuning on Multimodal Large Language ModelsCode0
Zero-shot cross-lingual transfer in instruction tuning of large language models0
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking0
RefuteBench: Evaluating Refuting Instruction-Following for Large Language ModelsCode0
Self-Distillation Bridges Distribution Gap in Language Model Fine-TuningCode2
Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions?0
VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models0
The Impact of Demonstrations on Multilingual In-Context Learning: A Multidimensional AnalysisCode0
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models0
PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt TuningCode1
CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models0
The Revolution of Multimodal Large Language Models: A SurveyCode2
Your Vision-Language Model Itself Is a Strong Filter: Towards High-Quality Instruction Tuning with Data SelectionCode1
A Critical Evaluation of AI Feedback for Aligning Large Language ModelsCode2
Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task ArithmeticCode2
Transformer-based Causal Language Models Perform Clustering0
EventRL: Enhancing Event Extraction with Outcome Supervision for Large Language ModelsCode3
Aligning Modalities in Vision Large Language Models via Preference Fine-tuningCode2
Aligning Large Language Models by On-Policy Self-JudgmentCode0
Multi-modal Preference Alignment Remedies Degradation of Visual Instruction Tuning on Language ModelsCode1
AbsInstruct: Eliciting Abstraction Ability from LLMs through Explanation Tuning with Plausibility EstimationCode0
Efficient Prompt Optimization Through the Lens of Best Arm Identification0
Show:102550
← PrevPage 15 of 23Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AutoIF (Llama3 70B)Inst-level loose-accuracy90.4Unverified
2AutoIF (Qwen2 72B)Inst-level loose-accuracy88Unverified
3GPT-4Inst-level loose-accuracy85.37Unverified
4PaLM 2 SInst-level loose-accuracy59.11Unverified