| The Lottery Ticket Hypothesis for Pre-trained BERT Networks | Jul 23, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Language-agnostic BERT Sentence Embedding | Jul 3, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Pre-training via Paraphrasing | Jun 26, 2020 | Document SummarizationDocument Translation | CodeCode Available | 1 |
| MC-BERT: Efficient Language Pre-Training via a Meta Controller | Jun 10, 2020 | Binary ClassificationCloze Test | CodeCode Available | 1 |
| Massive Choice, Ample Tasks (MaChAmp): A Toolkit for Multi-task Learning in NLP | May 29, 2020 | Dependency ParsingLanguage Modeling | CodeCode Available | 1 |
| HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training | May 1, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Segatron: Segment-Aware Transformer for Language Modeling and Understanding | Apr 30, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Train No Evil: Selective Masking for Task-Guided Pre-Training | Apr 21, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue | Apr 15, 2020 | Dialogue State TrackingIntent Detection | CodeCode Available | 1 |
| ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators | Mar 23, 2020 | GPULanguage Modeling | CodeCode Available | 1 |
| Talking-Heads Attention | Mar 5, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| REALM: Retrieval-Augmented Language Model Pre-Training | Feb 10, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| UNITER: UNiversal Image-TExt Representation Learning | Sep 25, 2019 | Image-text matchingImage-text Retrieval | CodeCode Available | 1 |
| LXMERT: Learning Cross-Modality Encoder Representations from Transformers | Aug 20, 2019 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Mask-Predict: Parallel Decoding of Conditional Masked Language Models | Apr 19, 2019 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| GeoRecon: Graph-Level Representation Learning for 3D Molecules via Reconstruction-Based Pretraining | Jun 16, 2025 | DenoisingLanguage Modeling | —Unverified | 0 |
| Masked Language Models are Good Heterogeneous Graph Generalizers | Jun 6, 2025 | Graph LearningLanguage Modeling | CodeCode Available | 0 |
| Improving Low-Resource Morphological Inflection via Self-Supervised Objectives | Jun 5, 2025 | DecoderLanguage Modeling | —Unverified | 0 |
| HAD: Hybrid Architecture Distillation Outperforms Teacher in Genomic Sequence Modeling | May 27, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Ankh3: Multi-Task Pretraining with Sequence Denoising and Completion Enhances Protein Representations | May 26, 2025 | DenoisingLanguage Modeling | —Unverified | 0 |
| ADALog: Adaptive Unsupervised Anomaly detection in Logs with Self-attention Masked Language Model | May 15, 2025 | Anomaly DetectionLanguage Modeling | —Unverified | 0 |
| CodeSSM: Towards State Space Models for Code Understanding | May 2, 2025 | Clone DetectionLanguage Modeling | —Unverified | 0 |
| In-Context Learning can distort the relationship between sequence likelihoods and biological fitness | Apr 23, 2025 | In-Context LearningLanguage Modeling | —Unverified | 0 |
| Low-Resource Transliteration for Roman-Urdu and Urdu Using Transformer-Based Models | Mar 27, 2025 | Information RetrievalLanguage Modeling | —Unverified | 0 |
| Enhancing Domain-Specific Encoder Models with LLM-Generated Data: How to Leverage Ontologies, and How to Do Without Them | Mar 27, 2025 | Continual PretrainingLanguage Modeling | —Unverified | 0 |