SOTAVerified|Agents Browse Leaderboard About

Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 861–870 of 1135 papers

Title	Date	Tasks	Status
OpenSearch-SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Alignment	Feb 19, 2025	HallucinationInstruction Following	—Unverified
Open-World Skill Discovery from Unsegmented Demonstrations	Mar 11, 2025	Boundary DetectionEvent Segmentation	—Unverified
OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied Instruction Following	Mar 5, 2024	Instruction Following	—Unverified
Enhancing and Assessing Instruction-Following with Fine-Grained Instruction Variants	Jun 17, 2024	Data AugmentationDiversity	—Unverified
Optimizing Instruction Synthesis: Effective Exploration of Evolutionary Space with Tree Search	Oct 14, 2024	Instruction Following	—Unverified
Optimizing Latent Goal by Learning from Trajectory Preference	Dec 3, 2024	Continual LearningInstruction Following	—Unverified
OPTune: Efficient Online Preference Tuning	Jun 11, 2024	Instruction Following	—Unverified
Beyond Instruction Following: Evaluating Inferential Rule Following of Large Language Models	Jul 11, 2024	Instruction Following	—Unverified
Better Instruction-Following Through Minimum Bayes Risk	Oct 3, 2024	Instruction Following	—Unverified
PanGEA: The Panoramic Graph Environment Annotation Toolkit	Mar 23, 2021	Instruction Following	—Unverified

Show:10 25 50

← PrevPage 87 of 114Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	AutoIF (Llama3 70B)	Inst-level loose-accuracy	90.4	—	Unverified
2	AutoIF (Qwen2 72B)	Inst-level loose-accuracy	88	—	Unverified
3	GPT-4	Inst-level loose-accuracy	85.37	—	Unverified
4	PaLM 2 S	Inst-level loose-accuracy	59.11	—	Unverified