Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 391–400 of 1135 papers

Title	Date	Tasks	Status	Hype
ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization	Oct 14, 2024	Explanation GenerationImage Forgery Detection	—Unverified	0
Optimizing Instruction Synthesis: Effective Exploration of Evolutionary Space with Tree Search	Oct 14, 2024	Instruction Following	—Unverified	0
DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model	Oct 14, 2024	DiversityInstruction Following	—Unverified	0
Thinking LLMs: General Instruction Following with Thought Generation	Oct 14, 2024	General KnowledgeInstruction Following	—Unverified	0
Conversational Code Generation: a Case Study of Designing a Dialogue System for Generating Driving Scenarios for Testing Autonomous Vehicles	Oct 13, 2024	Autonomous VehiclesCode Generation	—Unverified	0
Surgical-LLaVA: Toward Surgical Scenario Understanding via Large Language and Vision Models	Oct 13, 2024	Instruction FollowingQuestion Answering	—Unverified	0
Toward General Instruction-Following Alignment for Retrieval-Augmented Generation	Oct 12, 2024	Instruction FollowingRAG	CodeCode Available	2
Are You Human? An Adversarial Benchmark to Expose LLMs	Oct 12, 2024	Instruction Following	—Unverified	0
SeRA: Self-Reviewing and Alignment of Large Language Models using Implicit Reward Margins	Oct 12, 2024	Instruction Following	—Unverified	0
Nudging: Inference-time Alignment of LLMs via Guided Decoding	Oct 11, 2024	General KnowledgeGSM8K	—Unverified	0

Show:10 25 50

← PrevPage 40 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified