Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 376–400 of 1135 papers

Title	Date	Tasks	Status	Hype
LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound	Oct 19, 2024	Instruction FollowingKnowledge Distillation	—Unverified	0
LoGU: Long-form Generation with Uncertainty Expressions	Oct 18, 2024	FormInstruction Following	CodeCode Available	1
Do LLMs "know" internally when they follow instructions?	Oct 18, 2024	Instruction FollowingPrompt Engineering	CodeCode Available	1
Do LLMs estimate uncertainty well in instruction-following?	Oct 18, 2024	Instruction Following	CodeCode Available	0
Boosting LLM Translation Skills without General Ability Loss via Rationale Distillation	Oct 17, 2024	General KnowledgeInstruction Following	—Unverified	0
LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning	Oct 17, 2024	image-classificationImage Classification	CodeCode Available	0
POROver: Improving Safety and Reducing Overrefusal in Large Language Models with Overgeneration and Preference Optimization	Oct 16, 2024	Instruction Following	CodeCode Available	0
Meta-Chunking: Learning Text Segmentation and Semantic Completion via Logical Perception	Oct 16, 2024	Binary ClassificationChunking	CodeCode Available	3
Evaluating the Instruction-following Abilities of Language Models using Knowledge Tasks	Oct 16, 2024	Instruction FollowingMultiple-choice	CodeCode Available	0
RuleRAG: Rule-guided retrieval-augmented generation with language models for question answering	Oct 15, 2024	In-Context LearningInstruction Following	CodeCode Available	1
Improving Instruction-Following in Language Models through Activation Steering	Oct 15, 2024	Instruction FollowingText Generation	—Unverified	0
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling	Oct 15, 2024	Instruction FollowingKnowledge Distillation	—Unverified	0
SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding	Oct 15, 2024	Instruction FollowingVisual Question Answering (VQA)	—Unverified	0
Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs	Oct 14, 2024	Instruction Following	—Unverified	0
How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective	Oct 14, 2024	Density Ratio EstimationGSM8K	CodeCode Available	0
ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization	Oct 14, 2024	Explanation GenerationImage Forgery Detection	—Unverified	0
Optimizing Instruction Synthesis: Effective Exploration of Evolutionary Space with Tree Search	Oct 14, 2024	Instruction Following	—Unverified	0
DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model	Oct 14, 2024	DiversityInstruction Following	—Unverified	0
Thinking LLMs: General Instruction Following with Thought Generation	Oct 14, 2024	General KnowledgeInstruction Following	—Unverified	0
Conversational Code Generation: a Case Study of Designing a Dialogue System for Generating Driving Scenarios for Testing Autonomous Vehicles	Oct 13, 2024	Autonomous VehiclesCode Generation	—Unverified	0
Surgical-LLaVA: Toward Surgical Scenario Understanding via Large Language and Vision Models	Oct 13, 2024	Instruction FollowingQuestion Answering	—Unverified	0
Toward General Instruction-Following Alignment for Retrieval-Augmented Generation	Oct 12, 2024	Instruction FollowingRAG	CodeCode Available	2
Are You Human? An Adversarial Benchmark to Expose LLMs	Oct 12, 2024	Instruction Following	—Unverified	0
SeRA: Self-Reviewing and Alignment of Large Language Models using Implicit Reward Margins	Oct 12, 2024	Instruction Following	—Unverified	0
Nudging: Inference-time Alignment of LLMs via Guided Decoding	Oct 11, 2024	General KnowledgeGSM8K	—Unverified	0

Show:10 25 50

← PrevPage 16 of 46Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified