Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 351–400 of 1135 papers

Title	Date	Tasks	Status	Hype
On the Loss of Context-awareness in General Instruction Fine-tuning	Nov 5, 2024	BenchmarkingInstruction Following	CodeCode Available	0
Data Extraction Attacks in Retrieval-Augmented Generation via Backdoors	Nov 3, 2024	Instruction FollowingRAG	—Unverified	0
Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models	Nov 3, 2024	HallucinationInstruction Following	CodeCode Available	0
TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models	Nov 2, 2024	Image DescriptionImage Generation	—Unverified	0
LLaMo: Large Language Model-based Molecular Graph Assistant	Oct 31, 2024	Instruction FollowingIUPAC Name Prediction	CodeCode Available	1
Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models	Oct 31, 2024	Instruction FollowingReranking	CodeCode Available	0
Constraint Back-translation Improves Complex Instruction Following of Large Language Models	Oct 31, 2024	Instruction FollowingTranslation	CodeCode Available	1
MDCure: A Scalable Pipeline for Multi-Document Instruction-Following	Oct 30, 2024	ArticlesInstruction Following	CodeCode Available	0
FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization system	Oct 28, 2024	Code GenerationHumanEval	CodeCode Available	0
UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function	Oct 28, 2024	Instruction FollowingText Generation	—Unverified	0
SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models	Oct 25, 2024	Instruction FollowingKnowledge Distillation	—Unverified	0
Open6DOR: Benchmarking Open-instruction 6-DoF Object Rearrangement and A VLM-based Approach	Oct 24, 2024	BenchmarkingInstruction Following	CodeCode Available	2
BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning	Oct 24, 2024	Instruction FollowingNatural Language Understanding	—Unverified	0
DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations	Oct 24, 2024	Instruction FollowingQuestion Answering	CodeCode Available	1
Unbounded: A Generative Infinite Game of Character Life Simulation	Oct 24, 2024	Instruction FollowingLanguage Modelling	—Unverified	0
Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks	Oct 23, 2024	Instruction FollowingSafety Alignment	—Unverified	0
Cross-model Control: Improving Multiple Large Language Models in One-time Training	Oct 23, 2024	Instruction FollowingLanguage Modeling	CodeCode Available	1
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models	Oct 23, 2024	Instruction FollowingLanguage Modelling	CodeCode Available	2
Cross-lingual Transfer of Reward Models in Multilingual Alignment	Oct 23, 2024	Cross-Lingual TransferInstruction Following	CodeCode Available	0
SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains	Oct 23, 2024	Domain AdaptationInstruction Following	—Unverified	0
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning	Oct 23, 2024	Image CaptioningInstruction Following	CodeCode Available	1
Multi-IF: Benchmarking LLMs on Multi-Turn and Multilingual Instructions Following	Oct 21, 2024	BenchmarkingInstruction Following	CodeCode Available	2
GATEAU: Selecting Influential Samples for Long Context Alignment	Oct 21, 2024	Instruction FollowingLong-Context Understanding	CodeCode Available	1
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models	Oct 21, 2024	Instruction Followingobject-detection	—Unverified	0
Large Language Models for Autonomous Driving (LLM4AD): Concept, Benchmark, Experiments, and Challenges	Oct 20, 2024	Autonomous DrivingDecision Making	—Unverified	0
LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound	Oct 19, 2024	Instruction FollowingKnowledge Distillation	—Unverified	0
LoGU: Long-form Generation with Uncertainty Expressions	Oct 18, 2024	FormInstruction Following	CodeCode Available	1
Do LLMs "know" internally when they follow instructions?	Oct 18, 2024	Instruction FollowingPrompt Engineering	CodeCode Available	1
Do LLMs estimate uncertainty well in instruction-following?	Oct 18, 2024	Instruction Following	CodeCode Available	0
Boosting LLM Translation Skills without General Ability Loss via Rationale Distillation	Oct 17, 2024	General KnowledgeInstruction Following	—Unverified	0
LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning	Oct 17, 2024	image-classificationImage Classification	CodeCode Available	0
POROver: Improving Safety and Reducing Overrefusal in Large Language Models with Overgeneration and Preference Optimization	Oct 16, 2024	Instruction Following	CodeCode Available	0
Meta-Chunking: Learning Text Segmentation and Semantic Completion via Logical Perception	Oct 16, 2024	Binary ClassificationChunking	CodeCode Available	3
Evaluating the Instruction-following Abilities of Language Models using Knowledge Tasks	Oct 16, 2024	Instruction FollowingMultiple-choice	CodeCode Available	0
RuleRAG: Rule-guided retrieval-augmented generation with language models for question answering	Oct 15, 2024	In-Context LearningInstruction Following	CodeCode Available	1
Improving Instruction-Following in Language Models through Activation Steering	Oct 15, 2024	Instruction FollowingText Generation	—Unverified	0
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling	Oct 15, 2024	Instruction FollowingKnowledge Distillation	—Unverified	0
SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding	Oct 15, 2024	Instruction FollowingVisual Question Answering (VQA)	—Unverified	0
Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs	Oct 14, 2024	Instruction Following	—Unverified	0
How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective	Oct 14, 2024	Density Ratio EstimationGSM8K	CodeCode Available	0
ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization	Oct 14, 2024	Explanation GenerationImage Forgery Detection	—Unverified	0
Optimizing Instruction Synthesis: Effective Exploration of Evolutionary Space with Tree Search	Oct 14, 2024	Instruction Following	—Unverified	0
DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model	Oct 14, 2024	DiversityInstruction Following	—Unverified	0
Thinking LLMs: General Instruction Following with Thought Generation	Oct 14, 2024	General KnowledgeInstruction Following	—Unverified	0
Conversational Code Generation: a Case Study of Designing a Dialogue System for Generating Driving Scenarios for Testing Autonomous Vehicles	Oct 13, 2024	Autonomous VehiclesCode Generation	—Unverified	0
Surgical-LLaVA: Toward Surgical Scenario Understanding via Large Language and Vision Models	Oct 13, 2024	Instruction FollowingQuestion Answering	—Unverified	0
Toward General Instruction-Following Alignment for Retrieval-Augmented Generation	Oct 12, 2024	Instruction FollowingRAG	CodeCode Available	2
Are You Human? An Adversarial Benchmark to Expose LLMs	Oct 12, 2024	Instruction Following	—Unverified	0
SeRA: Self-Reviewing and Alignment of Large Language Models using Implicit Reward Margins	Oct 12, 2024	Instruction Following	—Unverified	0
Nudging: Inference-time Alignment of LLMs via Guided Decoding	Oct 11, 2024	General KnowledgeGSM8K	—Unverified	0

Show:10 25 50

← PrevPage 8 of 23Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified