| Debiasing the Cloze Task in Sequential Recommendation with Bidirectional Transformers | Jan 22, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| A Cohesive Distillation Architecture for Neural Language Models | Jan 12, 2023 | Knowledge DistillationLanguage Modeling | —Unverified | 0 |
| Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks | Jan 1, 2023 | Cross-Modal RetrievalImage Captioning | —Unverified | 0 |
| Cramming: Training a Language Model on a Single GPU in One Day | Dec 28, 2022 | GPULanguage Modeling | CodeCode Available | 3 |
| MicroBERT: Effective Training of Low-resource Monolingual BERTs through Parameter Reduction and Multitask Learning | Dec 23, 2022 | Dependency ParsingLanguage Modeling | CodeCode Available | 1 |
| Go-tuning: Improving Zero-shot Learning Abilities of Smaller Language Models | Dec 20, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Mu^2SLAM: Multitask, Multilingual Speech and Language Models | Dec 19, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| APOLLO: A Simple Approach for Adaptive Pretraining of Language Models for Logical Reasoning | Dec 19, 2022 | Data AugmentationLanguage Modeling | —Unverified | 0 |
| Efficient Pre-training of Masked Language Model via Concept-based Curriculum Masking | Dec 15, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Uniform Masking Prevails in Vision-Language Pretraining | Dec 10, 2022 | Image-text matchingLanguage Modeling | —Unverified | 0 |