SOTAVerified

Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Showing 726750 of 1135 papers

TitleStatusHype
On the Multi-turn Instruction Following for Conversational Web AgentsCode1
Unintended Impacts of LLM Alignment on Global RepresentationCode0
INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval ModelsCode1
Towards Robust Instruction Tuning on Multimodal Large Language ModelsCode0
Zero-shot cross-lingual transfer in instruction tuning of large language models0
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking0
RefuteBench: Evaluating Refuting Instruction-Following for Large Language ModelsCode0
Self-Distillation Bridges Distribution Gap in Language Model Fine-TuningCode2
Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions?0
VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models0
The Impact of Demonstrations on Multilingual In-Context Learning: A Multidimensional AnalysisCode0
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models0
PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt TuningCode1
CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models0
The Revolution of Multimodal Large Language Models: A SurveyCode2
Your Vision-Language Model Itself Is a Strong Filter: Towards High-Quality Instruction Tuning with Data SelectionCode1
A Critical Evaluation of AI Feedback for Aligning Large Language ModelsCode2
Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task ArithmeticCode2
Transformer-based Causal Language Models Perform Clustering0
EventRL: Enhancing Event Extraction with Outcome Supervision for Large Language ModelsCode3
Aligning Modalities in Vision Large Language Models via Preference Fine-tuningCode2
Aligning Large Language Models by On-Policy Self-JudgmentCode0
Multi-modal Preference Alignment Remedies Degradation of Visual Instruction Tuning on Language ModelsCode1
AbsInstruct: Eliciting Abstraction Ability from LLMs through Explanation Tuning with Plausibility EstimationCode0
Efficient Prompt Optimization Through the Lens of Best Arm Identification0
Show:102550
← PrevPage 30 of 46Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AutoIF (Llama3 70B)Inst-level loose-accuracy90.4Unverified
2AutoIF (Qwen2 72B)Inst-level loose-accuracy88Unverified
3GPT-4Inst-level loose-accuracy85.37Unverified
4PaLM 2 SInst-level loose-accuracy59.11Unverified