| LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale | Aug 15, 2022 | GPULanguage Modelling | CodeCode Available | 5 |
| ERNIE 2.0: A Continual Pre-training Framework for Language Understanding | Jul 29, 2019 | Chinese Named Entity RecognitionChinese Reading Comprehension | CodeCode Available | 3 |
| BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | Oct 11, 2018 | Citation Intent ClassificationCommon Sense Reasoning | CodeCode Available | 3 |
| ALBERT: A Lite BERT for Self-supervised Learning of Language Representations | Sep 26, 2019 | Common Sense ReasoningGPU | CodeCode Available | 2 |
| DeBERTa: Decoding-enhanced BERT with Disentangled Attention | Jun 5, 2020 | Common Sense ReasoningCoreference Resolution | CodeCode Available | 2 |
| Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | Oct 23, 2019 | Answer GenerationCommon Sense Reasoning | CodeCode Available | 2 |
| Fietje: An open, efficient LLM for Dutch | Dec 19, 2024 | Linguistic AcceptabilitySentiment Analysis | CodeCode Available | 2 |
| Masked Language Model Scoring | Oct 31, 2019 | AttributeDomain Adaptation | CodeCode Available | 1 |
| FNet: Mixing Tokens with Fourier Transforms | May 9, 2021 | Linguistic AcceptabilityMachine Translation | CodeCode Available | 1 |
| Q8BERT: Quantized 8Bit BERT | Oct 14, 2019 | Linguistic AcceptabilityNatural Language Inference | CodeCode Available | 1 |
| Entailment as Few-Shot Learner | Apr 29, 2021 | Contrastive LearningData Augmentation | CodeCode Available | 1 |
| ChatGPT: Jack of all trades, master of none | Feb 21, 2023 | AllChatbot | CodeCode Available | 1 |
| A Statistical Framework for Low-bitwidth Training of Deep Neural Networks | Oct 27, 2020 | Linguistic AcceptabilityNatural Language Inference | CodeCode Available | 1 |
| On the Robustness of Language Encoders against Grammatical Errors | May 12, 2020 | Cloze TestLinguistic Acceptability | CodeCode Available | 1 |
| Learning to Encode Position for Transformer with Continuous Dynamical Model | Mar 13, 2020 | Inductive BiasLinguistic Acceptability | CodeCode Available | 1 |
| JCoLA: Japanese Corpus of Linguistic Acceptability | Sep 22, 2023 | ArticlesLinguistic Acceptability | CodeCode Available | 1 |
| Big Bird: Transformers for Longer Sequences | Jul 28, 2020 | Linguistic AcceptabilityNatural Language Inference | CodeCode Available | 1 |
| data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language | Feb 7, 2022 | image-classificationImage Classification | CodeCode Available | 1 |
| How to Train BERT with an Academic Budget | Apr 15, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| GeDi: Generative Discriminator Guided Sequence Generation | Sep 14, 2020 | AttributeLinguistic Acceptability | CodeCode Available | 1 |
| Charformer: Fast Character Transformers via Gradient-based Subword Tokenization | Jun 23, 2021 | Inductive BiasLinguistic Acceptability | CodeCode Available | 1 |
| DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | Oct 2, 2019 | Hate Speech DetectionKnowledge Distillation | CodeCode Available | 1 |
| RealFormer: Transformer Likes Residual Attention | Dec 21, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning | May 29, 2023 | Contrastive LearningData Augmentation | CodeCode Available | 1 |
| RoBERTa: A Robustly Optimized BERT Pretraining Approach | Jul 26, 2019 | Common Sense ReasoningDocument Image Classification | CodeCode Available | 1 |