Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 351–375 of 1135 papers

Title	Date	Tasks	Status	Hype
DocLens: Multi-aspect Fine-grained Evaluation for Medical Text Generation	Nov 16, 2023	Decision MakingInstruction Following	CodeCode Available	1
Lexicon Learning for Few Shot Sequence Modeling	Aug 1, 2021	Instruction FollowingMachine Translation	CodeCode Available	1
Learning to Map Natural Language Instructions to Physical Quadcopter Control using Simulated Flight	Oct 21, 2019	continuous-controlContinuous Control	CodeCode Available	1
Enhancing Cross-Tokenizer Knowledge Distillation with Contextual Dynamical Mapping	Feb 16, 2025	Code GenerationInstruction Following	CodeCode Available	1
CB2: Collaborative Natural Language Interaction Research Platform	Mar 14, 2023	Instruction Following	CodeCode Available	1
Engineering flexible machine learning systems by traversing functionally-invariant paths	Apr 30, 2022	Adversarial RobustnessContinual Learning	CodeCode Available	1
Large Language Models as Evaluators for Recommendation Explanations	Jun 5, 2024	Common Sense ReasoningInstruction Following	CodeCode Available	1
Are Emergent Abilities in Large Language Models just In-Context Learning?	Sep 4, 2023	In-Context LearningInstruction Following	CodeCode Available	1
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits	Oct 2, 2024	Instruction FollowingMath	CodeCode Available	1
MergeBench: A Benchmark for Merging Domain-Specialized LLMs	May 16, 2025	Instruction Following	CodeCode Available	1
Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning	Sep 19, 2024	FormInstruction Following	CodeCode Available	1
Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems	Feb 27, 2024	Instruction FollowingRAG	CodeCode Available	1
Lana: A Language-Capable Navigator for Instruction Following and Generation	Mar 15, 2023	Instruction FollowingText Generation	CodeCode Available	1
MoDS: Model-oriented Data Selection for Instruction Tuning	Nov 27, 2023	Instruction Followingmodel	CodeCode Available	1
A Recipe For Building a Compliant Real Estate Chatbot	Oct 7, 2024	ChatbotInstruction Following	CodeCode Available	1
Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer	Nov 12, 2023	In-Context LearningInstruction Following	CodeCode Available	1
Language-Conditioned Reinforcement Learning to Solve Misunderstandings with Action Corrections	Nov 18, 2022	Instruction Followingreinforcement-learning	CodeCode Available	1
Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text	Jun 8, 2025	Instruction Following	CodeCode Available	1
Efficient Inference of Vision Instruction-Following Models with Elastic Cache	Jul 25, 2024	Instruction FollowingText Generation	CodeCode Available	1
FuseChat-3.0: Preference Optimization Meets Heterogeneous Model Fusion	Mar 6, 2025	General KnowledgeInstruction Following	CodeCode Available	1
Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-Translation	Jan 12, 2024	Instruction FollowingTranslation	CodeCode Available	1
Language Imbalance Driven Rewarding for Multilingual Self-improving	Oct 11, 2024	Arithmetic ReasoningInstruction Following	CodeCode Available	1
LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone Sensors	Jun 20, 2024	16kInstruction Following	CodeCode Available	1
Can Language Models Follow Multiple Turns of Entangled Instructions?	Mar 17, 2025	Instruction FollowingMemorization	CodeCode Available	1
Is In-Context Learning Sufficient for Instruction Following in LLMs?	May 30, 2024	In-Context LearningInstruction Following	CodeCode Available	1

Show:10 25 50

← PrevPage 15 of 46Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified