Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 531–540 of 1135 papers

Title	Date	Tasks	Status
T2VTextBench: A Human Evaluation Benchmark for Textual Control in Video Generation Models	May 8, 2025	Instruction FollowingText-to-Video Generation	—Unverified
Incentivizing Inclusive Contributions in Model Sharing Markets	May 5, 2025	Federated LearningInstruction Following	—Unverified
PIPA: A Unified Evaluation Protocol for Diagnosing Interactive Planning Agents	May 2, 2025	Instruction FollowingResponse Generation	—Unverified
T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation	May 1, 2025	counterfactualInstruction Following	—Unverified
UAV-VLN: End-to-End Vision Language guided Navigation for UAVs	Apr 30, 2025	Common Sense ReasoningInstruction Following	—Unverified
Ask, Fail, Repeat: Meeseeks, an Iterative Feedback Benchmark for LLMs' Multi-turn Instruction-Following Ability	Apr 30, 2025	Instruction FollowingIntent Recognition	—Unverified
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models	Apr 29, 2025	BenchmarkingDataset Generation	CodeCode Available
CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks	Apr 29, 2025	Instruction Following	—Unverified
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs	Apr 24, 2025	Image-text RetrievalInstruction Following	—Unverified
ParamΔ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost	Apr 23, 2025	Instruction FollowingLanguage Modeling	—Unverified

Show:10 25 50

← PrevPage 54 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified