| LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale | Aug 15, 2022 | GPULanguage Modelling | CodeCode Available | 5 | 5 |
| BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | Oct 11, 2018 | Citation Intent ClassificationCommon Sense Reasoning | CodeCode Available | 3 | 5 |
| ERNIE 2.0: A Continual Pre-training Framework for Language Understanding | Jul 29, 2019 | Chinese Named Entity RecognitionChinese Reading Comprehension | CodeCode Available | 3 | 5 |
| Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | Oct 23, 2019 | Answer GenerationCommon Sense Reasoning | CodeCode Available | 2 | 5 |
| Fietje: An open, efficient LLM for Dutch | Dec 19, 2024 | Linguistic AcceptabilitySentiment Analysis | CodeCode Available | 2 | 5 |
| ALBERT: A Lite BERT for Self-supervised Learning of Language Representations | Sep 26, 2019 | Common Sense ReasoningGPU | CodeCode Available | 2 | 5 |
| DeBERTa: Decoding-enhanced BERT with Disentangled Attention | Jun 5, 2020 | Common Sense ReasoningCoreference Resolution | CodeCode Available | 2 | 5 |
| Q8BERT: Quantized 8Bit BERT | Oct 14, 2019 | Linguistic AcceptabilityNatural Language Inference | CodeCode Available | 1 | 5 |
| A Statistical Framework for Low-bitwidth Training of Deep Neural Networks | Oct 27, 2020 | Linguistic AcceptabilityNatural Language Inference | CodeCode Available | 1 | 5 |
| Big Bird: Transformers for Longer Sequences | Jul 28, 2020 | Linguistic AcceptabilityNatural Language Inference | CodeCode Available | 1 | 5 |
| Charformer: Fast Character Transformers via Gradient-based Subword Tokenization | Jun 23, 2021 | Inductive BiasLinguistic Acceptability | CodeCode Available | 1 | 5 |
| ChatGPT: Jack of all trades, master of none | Feb 21, 2023 | AllChatbot | CodeCode Available | 1 | 5 |
| data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language | Feb 7, 2022 | image-classificationImage Classification | CodeCode Available | 1 | 5 |
| DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | Oct 2, 2019 | Hate Speech DetectionKnowledge Distillation | CodeCode Available | 1 | 5 |
| Entailment as Few-Shot Learner | Apr 29, 2021 | Contrastive LearningData Augmentation | CodeCode Available | 1 | 5 |
| FNet: Mixing Tokens with Fourier Transforms | May 9, 2021 | Linguistic AcceptabilityMachine Translation | CodeCode Available | 1 | 5 |
| GeDi: Generative Discriminator Guided Sequence Generation | Sep 14, 2020 | AttributeLinguistic Acceptability | CodeCode Available | 1 | 5 |
| How to Train BERT with an Academic Budget | Apr 15, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| RealFormer: Transformer Likes Residual Attention | Dec 21, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| JCoLA: Japanese Corpus of Linguistic Acceptability | Sep 22, 2023 | ArticlesLinguistic Acceptability | CodeCode Available | 1 | 5 |
| Learning to Encode Position for Transformer with Continuous Dynamical Model | Mar 13, 2020 | Inductive BiasLinguistic Acceptability | CodeCode Available | 1 | 5 |
| LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning | May 29, 2023 | Contrastive LearningData Augmentation | CodeCode Available | 1 | 5 |
| On the Robustness of Language Encoders against Grammatical Errors | May 12, 2020 | Cloze TestLinguistic Acceptability | CodeCode Available | 1 | 5 |
| Masked Language Model Scoring | Oct 31, 2019 | AttributeDomain Adaptation | CodeCode Available | 1 | 5 |
| RoBERTa: A Robustly Optimized BERT Pretraining Approach | Jul 26, 2019 | Common Sense ReasoningDocument Image Classification | CodeCode Available | 1 | 5 |
| RuCoLA: Russian Corpus of Linguistic Acceptability | Oct 23, 2022 | Linguistic AcceptabilityText Generation | CodeCode Available | 1 | 5 |
| ScandEval: A Benchmark for Scandinavian Natural Language Processing | Apr 3, 2023 | BenchmarkingCross-Lingual Transfer | CodeCode Available | 1 | 5 |
| SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization | Nov 8, 2019 | Linguistic AcceptabilityNatural Language Inference | CodeCode Available | 1 | 5 |
| Synthesizer: Rethinking Self-Attention in Transformer Models | May 2, 2020 | Abstractive Text SummarizationDialogue Generation | CodeCode Available | 1 | 5 |
| tasksource: A Dataset Harmonization Framework for Streamlined NLP Multi-Task Learning and Evaluation | Jan 14, 2023 | Language ModellingLinguistic Acceptability | CodeCode Available | 1 | 5 |
| Towards Debiasing Sentence Representations | Jul 16, 2020 | Linguistic AcceptabilityNatural Language Understanding | CodeCode Available | 1 | 5 |
| Language Models Use Monotonicity to Assess NPI Licensing | May 28, 2021 | DiagnosticLinguistic Acceptability | CodeCode Available | 0 | 5 |
| CUE: An Uncertainty Interpretation Framework for Text Classifiers Built on Pre-Trained Language Models | Jun 6, 2023 | Emotion ClassificationLinguistic Acceptability | CodeCode Available | 0 | 5 |
| MELA: Multilingual Evaluation of Linguistic Acceptability | Nov 15, 2023 | Code GenerationCross-Lingual Transfer | CodeCode Available | 0 | 5 |
| Monolingual and Cross-Lingual Acceptability Judgments with the Italian CoLA corpus | Sep 24, 2021 | CoLAdomain classification | CodeCode Available | 0 | 5 |
| TinyBERT: Distilling BERT for Natural Language Understanding | Sep 23, 2019 | Knowledge DistillationLanguage Modelling | CodeCode Available | 0 | 5 |
| Multi-Task Deep Neural Networks for Natural Language Understanding | Jan 31, 2019 | Domain AdaptationLanguage Modeling | CodeCode Available | 0 | 5 |
| Can BERT eat RuCoLA? Topological Data Analysis to Explain | Apr 4, 2023 | CoLALinguistic Acceptability | CodeCode Available | 0 | 5 |
| Natural Language Generation for Effective Knowledge Distillation | Nov 1, 2019 | Knowledge DistillationLinguistic Acceptability | CodeCode Available | 0 | 5 |
| Neural Network Acceptability Judgments | May 31, 2018 | CoLAGeneral Classification | CodeCode Available | 0 | 5 |
| General Cross-Architecture Distillation of Pretrained Language Models into Matrix Embeddings | Sep 17, 2021 | CoLALinguistic Acceptability | CodeCode Available | 0 | 5 |
| Domain Adversarial Fine-Tuning as an Effective Regularizer | Sep 28, 2020 | Linguistic AcceptabilityNatural Language Understanding | CodeCode Available | 0 | 5 |
| NoCoLA: The Norwegian Corpus of Linguistic Acceptability | Jun 13, 2023 | Binary ClassificationDiagnostic | CodeCode Available | 0 | 5 |
| VALUE: Understanding Dialect Disparity in NLU | Apr 6, 2022 | Linguistic AcceptabilityNatural Language Understanding | CodeCode Available | 0 | 5 |
| ERNIE: Enhanced Language Representation with Informative Entities | May 17, 2019 | Entity LinkingEntity Typing | CodeCode Available | 0 | 5 |
| SpanBERT: Improving Pre-training by Representing and Predicting Spans | Jul 24, 2019 | Coreference ResolutionLinguistic Acceptability | CodeCode Available | 0 | 5 |
| SqueezeBERT: What can computer vision teach NLP about efficient neural networks? | Jun 19, 2020 | Linguistic AcceptabilityNatural Language Inference | CodeCode Available | 0 | 5 |
| Revisiting Acceptability Judgements | May 23, 2023 | Cross-Lingual TransferLinguistic Acceptability | CodeCode Available | 0 | 5 |
| Acceptability Judgements via Examining the Topology of Attention Maps | May 19, 2022 | CoLALinguistic Acceptability | CodeCode Available | 0 | 5 |
| Rating Distributions and Bayesian Inference: Enhancing Cognitive Models of Spatial Language Use | Jul 1, 2018 | Bayesian InferenceLinguistic Acceptability | —Unverified | 0 | 0 |