| What to Hide from Your Students: Attention-Guided Masked Image Modeling | Mar 23, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation | Mar 22, 2022 | Decision MakingLanguage Modeling | CodeCode Available | 1 |
| How does the pre-training objective affect what large language models learn about linguistic properties? | Mar 20, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Geographic Adaptation of Pretrained Language Models | Mar 16, 2022 | Language IdentificationLanguage Modeling | CodeCode Available | 0 |
| SkillNet-NLU: A Sparsely Activated Model for General-Purpose Natural Language Understanding | Mar 7, 2022 | Language ModellingMasked Language Modeling | —Unverified | 0 |
| "Is Whole Word Masking Always Better for Chinese BERT?": Probing on Chinese Grammatical Error Correction | Mar 1, 2022 | Grammatical Error CorrectionLanguage Modeling | —Unverified | 0 |
| Probing BERT's priors with serial reproduction chains | Feb 24, 2022 | Language ModellingMasked Language Modeling | CodeCode Available | 0 |
| VU-BERT: A Unified framework for Visual Dialog | Feb 22, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Transformer Quality in Linear Time | Feb 21, 2022 | 8kLanguage Modeling | CodeCode Available | 1 |
| Should You Mask 15% in Masked Language Modeling? | Feb 16, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 1 |