SOTAVerified

Instruction Following

Instruction following is the basic task of the model. This task is dedicated to evaluating the ability of the large model to follow human instructions. It is hoped that the model can generate controllable and safe answers.

Papers

Showing 401450 of 1135 papers

TitleStatusHype
MergeBench: A Benchmark for Merging Domain-Specialized LLMsCode1
Contrastive Vision-Language Alignment Makes Efficient Instruction LearnerCode1
CoachLM: Automatic Instruction Revisions Improve the Data Quality in LLM Instruction TuningCode1
IHEval: Evaluating Language Models on Following the Instruction HierarchyCode1
LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and GenerationCode1
A Recipe For Building a Compliant Real Estate ChatbotCode1
Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small ScorerCode1
Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated TextCode1
Efficient Inference of Vision Instruction-Following Models with Elastic CacheCode1
LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone SensorsCode1
LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMsCode1
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMsCode1
MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-TuningCode1
Can Language Models Follow Multiple Turns of Entangled Instructions?Code1
EarthMarker: A Visual Prompting Multi-modal Large Language Model for Remote SensingCode1
LIONs: An Empirically Optimized Approach to Align Language ModelsCode1
Lexicon Learning for Few-Shot Neural Sequence ModelingCode1
Answer is All You Need: Instruction-following Text Embedding via Answering the QuestionCode1
Lexicon Learning for Few Shot Sequence ModelingCode1
Creative Agents: Empowering Agents with Imagination for Creative TasksCode1
An In-depth Look at Gemini's Language AbilitiesCode1
Learning to Map Natural Language Instructions to Physical Quadcopter Control using Simulated FlightCode1
LLaMo: Large Language Model-based Molecular Graph AssistantCode1
Bridging and Modeling Correlations in Pairwise Data for Direct Preference OptimizationCode1
Don't Reinvent the Wheel: Efficient Instruction-Following Text Embedding based on Guided Space TransformationCode1
Large Language Models as Evaluators for Recommendation ExplanationsCode1
Lana: A Language-Capable Navigator for Instruction Following and GenerationCode1
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language ModelsCode1
BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn DialoguesCode1
LongGenBench: Benchmarking Long-Form Generation in Long Context LLMsCode1
Language-Conditioned Reinforcement Learning to Solve Misunderstandings with Action CorrectionsCode1
Bactrian-X: Multilingual Replicable Instruction-Following Models with Low-Rank AdaptationCode1
Do LLMs "know" internally when they follow instructions?Code1
A Dual-Space Framework for General Knowledge Distillation of Large Language ModelsCode1
Language Imbalance Driven Rewarding for Multilingual Self-improvingCode1
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed BanditsCode1
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language ModelsCode1
An Emulator for Fine-Tuning Large Language Models using Small Language ModelsCode1
Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical StudyCode1
Jatmo: Prompt Injection Defense by Task-Specific FinetuningCode1
Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-TranslationCode1
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language TuningCode1
Diversify and Conquer: Diversity-Centric Data Selection with Iterative RefinementCode1
Instructive Decoding: Instruction-Tuned Large Language Models are Self-Refiner from Noisy InstructionsCode1
IDA-Bench: Evaluating LLMs on Interactive Guided Data AnalysisCode1
RMM: A Recursive Mental Model for Dialogue NavigationCode1
DistilQwen2.5: Industrial Practices of Training Distilled Open Lightweight Language Models0
Distilling Internet-Scale Vision-Language Models into Embodied Agents0
Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning0
Beyond Instruction Following: Evaluating Inferential Rule Following of Large Language Models0
Show:102550
← PrevPage 9 of 23Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AutoIF (Llama3 70B)Inst-level loose-accuracy90.4Unverified
2AutoIF (Qwen2 72B)Inst-level loose-accuracy88Unverified
3GPT-4Inst-level loose-accuracy85.37Unverified
4PaLM 2 SInst-level loose-accuracy59.11Unverified