Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 111–120 of 1135 papers

Title	Date	Tasks	Status	Hype
Efficient Telecom Specific LLM: TSLAM-Mini with QLoRA and Digital Twin Data	May 10, 2025	Instruction Followingparameter-efficient fine-tuning	—Unverified	0
MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks	May 9, 2025	DiagnosticInstruction Following	CodeCode Available	1
Assessing Robustness to Spurious Correlations in Post-Training Language Models	May 9, 2025	Instruction FollowingMathematical Reasoning	—Unverified	0
Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding	May 8, 2025	document understandingInstruction Following	CodeCode Available	1
T2VTextBench: A Human Evaluation Benchmark for Textual Control in Video Generation Models	May 8, 2025	Instruction FollowingText-to-Video Generation	—Unverified	0
Incentivizing Inclusive Contributions in Model Sharing Markets	May 5, 2025	Federated LearningInstruction Following	—Unverified	0
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis	May 5, 2025	ChatbotDecoder	CodeCode Available	3
PIPA: A Unified Evaluation Protocol for Diagnosing Interactive Planning Agents	May 2, 2025	Instruction FollowingResponse Generation	—Unverified	0
T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation	May 1, 2025	counterfactualInstruction Following	—Unverified	0
UAV-VLN: End-to-End Vision Language guided Navigation for UAVs	Apr 30, 2025	Common Sense ReasoningInstruction Following	—Unverified	0

Show:10 25 50

← PrevPage 12 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified