SOTAVerified

Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Showing 201225 of 1135 papers

TitleStatusHype
Lion: Adversarial Distillation of Proprietary Large Language ModelsCode2
Learning to Decode Collaboratively with Multiple Language ModelsCode2
BLSP-Emo: Towards Empathetic Large Speech-Language ModelsCode2
Aligning Modalities in Vision Large Language Models via Preference Fine-tuningCode2
EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing DomainCode2
#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language ModelsCode2
LITA: Language Instructed Temporal-Localization AssistantCode2
LLaVA-Plus: Learning to Use Tools for Creating Multimodal AgentsCode2
MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMsCode2
Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context LearningCode1
Adaptive Markup Language Generation for Contextually-Grounded Visual Document UnderstandingCode1
A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction FollowingCode1
Do LLMs "know" internally when they follow instructions?Code1
A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language ModelsCode1
Language Imbalance Driven Rewarding for Multilingual Self-improvingCode1
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language ModelsCode1
Benchmarking Large Language Models on Controllable Generation under Diversified InstructionsCode1
Instruction Following without Instruction TuningCode1
Large Language Models as Evaluators for Recommendation ExplanationsCode1
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable SummarizationCode1
A modular vision language navigation and manipulation framework for long horizon compositional tasks in indoor environmentCode1
Lana: A Language-Capable Navigator for Instruction Following and GenerationCode1
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language ModelsCode1
AlpaGasus: Training A Better Alpaca with Fewer DataCode1
DANLI: Deliberative Agent for Following Natural Language InstructionsCode1
Show:102550
← PrevPage 9 of 46Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AutoIF (Llama3 70B)Inst-level loose-accuracy90.4Unverified
2AutoIF (Qwen2 72B)Inst-level loose-accuracy88Unverified
3GPT-4Inst-level loose-accuracy85.37Unverified
4PaLM 2 SInst-level loose-accuracy59.11Unverified