| LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale | Aug 15, 2022 | GPULanguage Modelling | CodeCode Available | 5 |
| ERNIE 2.0: A Continual Pre-training Framework for Language Understanding | Jul 29, 2019 | Chinese Named Entity RecognitionChinese Reading Comprehension | CodeCode Available | 3 |
| BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | Oct 11, 2018 | Citation Intent ClassificationCommon Sense Reasoning | CodeCode Available | 3 |
| Fietje: An open, efficient LLM for Dutch | Dec 19, 2024 | Linguistic AcceptabilitySentiment Analysis | CodeCode Available | 2 |
| DeBERTa: Decoding-enhanced BERT with Disentangled Attention | Jun 5, 2020 | Common Sense ReasoningCoreference Resolution | CodeCode Available | 2 |
| Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | Oct 23, 2019 | Answer GenerationCommon Sense Reasoning | CodeCode Available | 2 |
| ALBERT: A Lite BERT for Self-supervised Learning of Language Representations | Sep 26, 2019 | Common Sense ReasoningGPU | CodeCode Available | 2 |
| JCoLA: Japanese Corpus of Linguistic Acceptability | Sep 22, 2023 | ArticlesLinguistic Acceptability | CodeCode Available | 1 |
| LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning | May 29, 2023 | Contrastive LearningData Augmentation | CodeCode Available | 1 |
| ScandEval: A Benchmark for Scandinavian Natural Language Processing | Apr 3, 2023 | BenchmarkingCross-Lingual Transfer | CodeCode Available | 1 |
| ChatGPT: Jack of all trades, master of none | Feb 21, 2023 | AllChatbot | CodeCode Available | 1 |
| tasksource: A Dataset Harmonization Framework for Streamlined NLP Multi-Task Learning and Evaluation | Jan 14, 2023 | Language ModellingLinguistic Acceptability | CodeCode Available | 1 |
| RuCoLA: Russian Corpus of Linguistic Acceptability | Oct 23, 2022 | Linguistic AcceptabilityText Generation | CodeCode Available | 1 |
| data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language | Feb 7, 2022 | image-classificationImage Classification | CodeCode Available | 1 |
| Charformer: Fast Character Transformers via Gradient-based Subword Tokenization | Jun 23, 2021 | Inductive BiasLinguistic Acceptability | CodeCode Available | 1 |
| FNet: Mixing Tokens with Fourier Transforms | May 9, 2021 | Linguistic AcceptabilityMachine Translation | CodeCode Available | 1 |
| Entailment as Few-Shot Learner | Apr 29, 2021 | Contrastive LearningData Augmentation | CodeCode Available | 1 |
| How to Train BERT with an Academic Budget | Apr 15, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| RealFormer: Transformer Likes Residual Attention | Dec 21, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| A Statistical Framework for Low-bitwidth Training of Deep Neural Networks | Oct 27, 2020 | Linguistic AcceptabilityNatural Language Inference | CodeCode Available | 1 |
| GeDi: Generative Discriminator Guided Sequence Generation | Sep 14, 2020 | AttributeLinguistic Acceptability | CodeCode Available | 1 |
| Big Bird: Transformers for Longer Sequences | Jul 28, 2020 | Linguistic AcceptabilityNatural Language Inference | CodeCode Available | 1 |
| Towards Debiasing Sentence Representations | Jul 16, 2020 | Linguistic AcceptabilityNatural Language Understanding | CodeCode Available | 1 |
| On the Robustness of Language Encoders against Grammatical Errors | May 12, 2020 | Cloze TestLinguistic Acceptability | CodeCode Available | 1 |
| Synthesizer: Rethinking Self-Attention in Transformer Models | May 2, 2020 | Abstractive Text SummarizationDialogue Generation | CodeCode Available | 1 |
| Learning to Encode Position for Transformer with Continuous Dynamical Model | Mar 13, 2020 | Inductive BiasLinguistic Acceptability | CodeCode Available | 1 |
| SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization | Nov 8, 2019 | Linguistic AcceptabilityNatural Language Inference | CodeCode Available | 1 |
| Masked Language Model Scoring | Oct 31, 2019 | AttributeDomain Adaptation | CodeCode Available | 1 |
| Q8BERT: Quantized 8Bit BERT | Oct 14, 2019 | Linguistic AcceptabilityNatural Language Inference | CodeCode Available | 1 |
| DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | Oct 2, 2019 | Hate Speech DetectionKnowledge Distillation | CodeCode Available | 1 |
| RoBERTa: A Robustly Optimized BERT Pretraining Approach | Jul 26, 2019 | Common Sense ReasoningDocument Image Classification | CodeCode Available | 1 |
| Dissecting Bias in LLMs: A Mechanistic Interpretability Perspective | Jun 5, 2025 | Linguistic Acceptabilitynamed-entity-recognition | —Unverified | 0 |
| Robust ASR Error Correction with Conservative Data Filtering | Jul 18, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Learning Phonotactics from Linguistic Informants | May 8, 2024 | Linguistic Acceptability | —Unverified | 0 |
| MELA: Multilingual Evaluation of Linguistic Acceptability | Nov 15, 2023 | Code GenerationCross-Lingual Transfer | CodeCode Available | 0 |
| Not all layers are equally as important: Every Layer Counts BERT | Nov 3, 2023 | AllLinguistic Acceptability | —Unverified | 0 |
| Data-Free Distillation of Language Model by Text-to-Text Transfer | Nov 3, 2023 | Data-free Knowledge DistillationDiversity | —Unverified | 0 |
| How well can machine-generated texts be identified and can language models be trained to avoid identification? | Oct 25, 2023 | Linguistic AcceptabilityText Generation | —Unverified | 0 |
| Defense of Adversarial Ranking Attack in Text Retrieval: Benchmark and Baseline via Detection | Jul 31, 2023 | Adversarial AttackInformation Retrieval | —Unverified | 0 |
| A Neural-Symbolic Approach Towards Identifying Grammatically Correct Sentences | Jul 16, 2023 | ArticlesCoLA | —Unverified | 0 |
| NoCoLA: The Norwegian Corpus of Linguistic Acceptability | Jun 13, 2023 | Binary ClassificationDiagnostic | CodeCode Available | 0 |
| CUE: An Uncertainty Interpretation Framework for Text Classifiers Built on Pre-Trained Language Models | Jun 6, 2023 | Emotion ClassificationLinguistic Acceptability | CodeCode Available | 0 |
| Revisiting Acceptability Judgements | May 23, 2023 | Cross-Lingual TransferLinguistic Acceptability | CodeCode Available | 0 |
| Can BERT eat RuCoLA? Topological Data Analysis to Explain | Apr 4, 2023 | CoLALinguistic Acceptability | CodeCode Available | 0 |
| Bag of Tricks for Effective Language Model Pretraining and Downstream Adaptation: A Case Study on GLUE | Feb 18, 2023 | Contrastive LearningDenoising | —Unverified | 0 |
| Acceptability Judgements via Examining the Topology of Attention Maps | May 19, 2022 | CoLALinguistic Acceptability | CodeCode Available | 0 |
| VALUE: Understanding Dialect Disparity in NLU | Apr 6, 2022 | Linguistic AcceptabilityNatural Language Understanding | CodeCode Available | 0 |
| Cross-Architecture Distillation Using Bidirectional CMOW Embeddings | Sep 29, 2021 | Linguistic AcceptabilityQQP | —Unverified | 0 |
| Monolingual and Cross-Lingual Acceptability Judgments with the Italian CoLA corpus | Sep 24, 2021 | CoLAdomain classification | CodeCode Available | 0 |
| Revisiting the Uniform Information Density Hypothesis | Sep 23, 2021 | Linguistic AcceptabilitySentence | —Unverified | 0 |