| Predicting Attention Sparsity in Transformers | Sep 24, 2021 | DecoderLanguage Modeling | —Unverified | 0 | 0 |
| Predicting Attention Sparsity in Transformers | Nov 16, 2021 | DecoderLanguage Modeling | —Unverified | 0 | 0 |
| Does Pre-training Induce Systematic Inference? How Masked Language Models Acquire Commonsense Knowledge | Dec 16, 2021 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Discovering Financial Hypernyms by Prompting Masked Language Models | Jun 1, 2022 | Domain AdaptationLanguage Modeling | —Unverified | 0 | 0 |
| Pre-Training and Prompting for Few-Shot Node Classification on Text-Attributed Graphs | Jul 22, 2024 | Few-Shot LearningGraph Neural Network | —Unverified | 0 | 0 |
| Pretraining Chinese BERT for Detecting Word Insertion and Deletion Errors | Apr 26, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning | Apr 29, 2020 | AllHellaSwag | —Unverified | 0 | 0 |
| Pre-training Language Model as a Multi-perspective Course Learner | May 6, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| UNITER: Learning UNiversal Image-TExt Representations | Sep 25, 2019 | Image-text matchingImage-text Retrieval | —Unverified | 0 | 0 |
| DICT-MLM: Improved Multilingual Pre-Training using Bilingual Dictionaries | Oct 23, 2020 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |