| LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale | Aug 15, 2022 | GPULanguage Modelling | CodeCode Available | 5 |
| ERNIE 2.0: A Continual Pre-training Framework for Language Understanding | Jul 29, 2019 | Chinese Named Entity RecognitionChinese Reading Comprehension | CodeCode Available | 3 |
| BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | Oct 11, 2018 | Citation Intent ClassificationCommon Sense Reasoning | CodeCode Available | 3 |
| Fietje: An open, efficient LLM for Dutch | Dec 19, 2024 | Linguistic AcceptabilitySentiment Analysis | CodeCode Available | 2 |
| DeBERTa: Decoding-enhanced BERT with Disentangled Attention | Jun 5, 2020 | Common Sense ReasoningCoreference Resolution | CodeCode Available | 2 |
| Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | Oct 23, 2019 | Answer GenerationCommon Sense Reasoning | CodeCode Available | 2 |
| ALBERT: A Lite BERT for Self-supervised Learning of Language Representations | Sep 26, 2019 | Common Sense ReasoningGPU | CodeCode Available | 2 |
| JCoLA: Japanese Corpus of Linguistic Acceptability | Sep 22, 2023 | ArticlesLinguistic Acceptability | CodeCode Available | 1 |
| LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning | May 29, 2023 | Contrastive LearningData Augmentation | CodeCode Available | 1 |
| ScandEval: A Benchmark for Scandinavian Natural Language Processing | Apr 3, 2023 | BenchmarkingCross-Lingual Transfer | CodeCode Available | 1 |
| ChatGPT: Jack of all trades, master of none | Feb 21, 2023 | AllChatbot | CodeCode Available | 1 |
| tasksource: A Dataset Harmonization Framework for Streamlined NLP Multi-Task Learning and Evaluation | Jan 14, 2023 | Language ModellingLinguistic Acceptability | CodeCode Available | 1 |
| RuCoLA: Russian Corpus of Linguistic Acceptability | Oct 23, 2022 | Linguistic AcceptabilityText Generation | CodeCode Available | 1 |
| data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language | Feb 7, 2022 | image-classificationImage Classification | CodeCode Available | 1 |
| Charformer: Fast Character Transformers via Gradient-based Subword Tokenization | Jun 23, 2021 | Inductive BiasLinguistic Acceptability | CodeCode Available | 1 |
| FNet: Mixing Tokens with Fourier Transforms | May 9, 2021 | Linguistic AcceptabilityMachine Translation | CodeCode Available | 1 |
| Entailment as Few-Shot Learner | Apr 29, 2021 | Contrastive LearningData Augmentation | CodeCode Available | 1 |
| How to Train BERT with an Academic Budget | Apr 15, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| RealFormer: Transformer Likes Residual Attention | Dec 21, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| A Statistical Framework for Low-bitwidth Training of Deep Neural Networks | Oct 27, 2020 | Linguistic AcceptabilityNatural Language Inference | CodeCode Available | 1 |
| GeDi: Generative Discriminator Guided Sequence Generation | Sep 14, 2020 | AttributeLinguistic Acceptability | CodeCode Available | 1 |
| Big Bird: Transformers for Longer Sequences | Jul 28, 2020 | Linguistic AcceptabilityNatural Language Inference | CodeCode Available | 1 |
| Towards Debiasing Sentence Representations | Jul 16, 2020 | Linguistic AcceptabilityNatural Language Understanding | CodeCode Available | 1 |
| On the Robustness of Language Encoders against Grammatical Errors | May 12, 2020 | Cloze TestLinguistic Acceptability | CodeCode Available | 1 |
| Synthesizer: Rethinking Self-Attention in Transformer Models | May 2, 2020 | Abstractive Text SummarizationDialogue Generation | CodeCode Available | 1 |