Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 351–375 of 1135 papers

Title	Date	Tasks	Status	Hype
On the Loss of Context-awareness in General Instruction Fine-tuning	Nov 5, 2024	BenchmarkingInstruction Following	CodeCode Available	0
Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models	Nov 3, 2024	HallucinationInstruction Following	CodeCode Available	0
Data Extraction Attacks in Retrieval-Augmented Generation via Backdoors	Nov 3, 2024	Instruction FollowingRAG	—Unverified	0
TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models	Nov 2, 2024	Image DescriptionImage Generation	—Unverified	0
LLaMo: Large Language Model-based Molecular Graph Assistant	Oct 31, 2024	Instruction FollowingIUPAC Name Prediction	CodeCode Available	1
Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models	Oct 31, 2024	Instruction FollowingReranking	CodeCode Available	0
Constraint Back-translation Improves Complex Instruction Following of Large Language Models	Oct 31, 2024	Instruction FollowingTranslation	CodeCode Available	1
MDCure: A Scalable Pipeline for Multi-Document Instruction-Following	Oct 30, 2024	ArticlesInstruction Following	CodeCode Available	0
FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization system	Oct 28, 2024	Code GenerationHumanEval	CodeCode Available	0
UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function	Oct 28, 2024	Instruction FollowingText Generation	—Unverified	0
SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models	Oct 25, 2024	Instruction FollowingKnowledge Distillation	—Unverified	0
Open6DOR: Benchmarking Open-instruction 6-DoF Object Rearrangement and A VLM-based Approach	Oct 24, 2024	BenchmarkingInstruction Following	CodeCode Available	2
BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning	Oct 24, 2024	Instruction FollowingNatural Language Understanding	—Unverified	0
DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations	Oct 24, 2024	Instruction FollowingQuestion Answering	CodeCode Available	1
Unbounded: A Generative Infinite Game of Character Life Simulation	Oct 24, 2024	Instruction FollowingLanguage Modelling	—Unverified	0
Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks	Oct 23, 2024	Instruction FollowingSafety Alignment	—Unverified	0
Cross-model Control: Improving Multiple Large Language Models in One-time Training	Oct 23, 2024	Instruction FollowingLanguage Modeling	CodeCode Available	1
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models	Oct 23, 2024	Instruction FollowingLanguage Modelling	CodeCode Available	2
Cross-lingual Transfer of Reward Models in Multilingual Alignment	Oct 23, 2024	Cross-Lingual TransferInstruction Following	CodeCode Available	0
SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains	Oct 23, 2024	Domain AdaptationInstruction Following	—Unverified	0
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning	Oct 23, 2024	Image CaptioningInstruction Following	CodeCode Available	1
Multi-IF: Benchmarking LLMs on Multi-Turn and Multilingual Instructions Following	Oct 21, 2024	BenchmarkingInstruction Following	CodeCode Available	2
GATEAU: Selecting Influential Samples for Long Context Alignment	Oct 21, 2024	Instruction FollowingLong-Context Understanding	CodeCode Available	1
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models	Oct 21, 2024	Instruction Followingobject-detection	—Unverified	0
Large Language Models for Autonomous Driving (LLM4AD): Concept, Benchmark, Experiments, and Challenges	Oct 20, 2024	Autonomous DrivingDecision Making	—Unverified	0

Show:10 25 50

← PrevPage 15 of 46Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified