Finetuned Language Models Are Zero-Shot Learners

2021-09-03ICLR 2022Code Available3· sign in to hype

Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le

Code Available — Be the first to reproduce this paper.

Code

github.com/google-research/flan
OfficialIn papertf★ 1,559
github.com/bigcode-project/starcoder
pytorch★ 7,528
github.com/bigscience-workshop/promptsource
none★ 3,007
github.com/openbiolink/promptsource
none★ 9
github.com/MS-P3/code6/tree/main/finetune
mindspore★ 0
github.com/hojjat-mokhtarabadi/promptsource
none★ 0
github.com/ukplab/arxiv2025-inherent-limits-plms
none★ 0

Abstract

This paper explores a simple method for improving the zero-shot learning abilities of language models. We show that instruction tuning -- finetuning language models on a collection of tasks described via instructions -- substantially improves zero-shot performance on unseen tasks. We take a 137B parameter pretrained language model and instruction-tune it on over 60 NLP tasks verbalized via natural language instruction templates. We evaluate this instruction-tuned model, which we call FLAN, on unseen task types. FLAN substantially improves the performance of its unmodified counterpart and surpasses zero-shot 175B GPT-3 on 20 of 25 tasks that we evaluate. FLAN even outperforms few-shot GPT-3 by a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and StoryCloze. Ablation studies reveal that number of finetuning datasets, model scale, and natural language instructions are key to the success of instruction tuning.

Tasks

ARC Common Sense Reasoning Coreference Resolution Language Modeling Language Modelling Machine Translation Natural Language Inference Question Answering RTE Sentence Completion Sentiment Analysis Zero-Shot Learning

Finetuned Language Models Are Zero-Shot Learners

Code

Abstract

Tasks

Benchmark Results

Reproductions