Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–60 of 1135 papers

Title	Date	Tasks	Status	Hype
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day	Jun 1, 2023	Image ClassificationInstruction Following	CodeCode Available	4
Otter: A Multi-Modal Model with In-Context Instruction Tuning	May 5, 2023	GPUIn-Context Learning	CodeCode Available	4
Instruction Tuning with GPT-4	Apr 6, 2023	Instruction Following	CodeCode Available	4
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection	Feb 23, 2023	Code CompletionComputer Security	CodeCode Available	4
IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models	May 22, 2025	BenchmarkingInstruction Following	CodeCode Available	3
LaViDa: A Large Diffusion Language Model for Multimodal Understanding	May 22, 2025	Instruction FollowingLanguage Modeling	CodeCode Available	3
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis	May 5, 2025	ChatbotDecoder	CodeCode Available	3
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning	Apr 3, 2025	Image GenerationInstruction Following	CodeCode Available	3
The Breeze 2 Herd of Models: Traditional Chinese LLMs Based on Llama with Vision-Aware and Function-Calling Capabilities	Jan 23, 2025	General KnowledgeInstruction Following	CodeCode Available	3
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model	Jan 21, 2025	Image GenerationInstruction Following	CodeCode Available	3

Show:10 25 50

← PrevPage 6 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified