SOTAVerified

Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Showing 326350 of 1135 papers

TitleStatusHype
NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language ModelsCode1
MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-TuningCode1
Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse PlanningCode1
Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation SystemsCode1
Defending Large Language Models against Jailbreak Attacks via Semantic SmoothingCode1
On the Multi-turn Instruction Following for Conversational Web AgentsCode1
INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval ModelsCode1
PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt TuningCode1
Your Vision-Language Model Itself Is a Strong Filter: Towards High-Quality Instruction Tuning with Data SelectionCode1
Multi-modal Preference Alignment Remedies Degradation of Visual Instruction Tuning on Language ModelsCode1
Answer is All You Need: Instruction-following Text Embedding via Answering the QuestionCode1
Aya Dataset: An Open-Access Collection for Multilingual Instruction TuningCode1
Personalized Language Modeling from Personalized Human FeedbackCode1
A Survey on Data Selection for LLM Instruction TuningCode1
SelectLLM: Can LLMs Select Important Instructions to Annotate?Code1
F-Eval: Assessing Fundamental Abilities with Refined Evaluation MethodsCode1
Self-Rewarding Language ModelsCode1
Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-TranslationCode1
Benchmarking Large Language Models on Controllable Generation under Diversified InstructionsCode1
Jatmo: Prompt Injection Defense by Task-Specific FinetuningCode1
An In-depth Look at Gemini's Language AbilitiesCode1
M3DBench: Let's Instruct Large Models with Multi-modal 3D PromptsCode1
Creative Agents: Empowering Agents with Imagination for Creative TasksCode1
Generative Parameter-Efficient Fine-TuningCode1
SeaLLMs -- Large Language Models for Southeast AsiaCode1
Show:102550
← PrevPage 14 of 46Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AutoIF (Llama3 70B)Inst-level loose-accuracy90.4Unverified
2AutoIF (Qwen2 72B)Inst-level loose-accuracy88Unverified
3GPT-4Inst-level loose-accuracy85.37Unverified
4PaLM 2 SInst-level loose-accuracy59.11Unverified