SOTAVerified

Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Showing 301325 of 1135 papers

TitleStatusHype
MoDS: Model-oriented Data Selection for Instruction TuningCode1
A Survey on Data Selection for LLM Instruction TuningCode1
Incentivizing Reasoning for Advanced Instruction-Following of Large Language ModelsCode1
Alexa Arena: A User-Centric Interactive Platform for Embodied AICode1
Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction FollowingCode1
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMsCode1
Infer Human's Intentions Before Following Natural Language InstructionsCode1
BLEUBERI: BLEU is a surprisingly effective reward for instruction followingCode1
AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source DataCode1
Improving Translation Faithfulness of Large Language Models via Augmenting InstructionsCode1
M-IFEval: Multilingual Instruction-Following EvaluationCode1
Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New LanguagesCode1
MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE FrameworkCode1
Hybrid Alignment Training for Large Language ModelsCode1
IDA-Bench: Evaluating LLMs on Interactive Guided Data AnalysisCode1
Instruction-Tuning Data Synthesis from Scratch via Web ReconstructionCode1
AceGPT, Localizing Large Language Models in ArabicCode1
ChatGPT may Pass the Bar Exam soon, but has a Long Way to Go for the LexGLUE benchmarkCode1
IHEval: Evaluating Language Models on Following the Instruction HierarchyCode1
EventHallusion: Diagnosing Event Hallucinations in Video LLMsCode1
Inferring Rewards from Language in ContextCode1
MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn InteractionsCode1
Ex3: Automatic Novel Writing by Extracting, Excelsior and ExpandingCode1
Agri-LLaVA: Knowledge-Infused Large Multimodal Assistant on Agricultural Pests and DiseasesCode1
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMsCode1
Show:102550
← PrevPage 13 of 46Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AutoIF (Llama3 70B)Inst-level loose-accuracy90.4Unverified
2AutoIF (Qwen2 72B)Inst-level loose-accuracy88Unverified
3GPT-4Inst-level loose-accuracy85.37Unverified
4PaLM 2 SInst-level loose-accuracy59.11Unverified