| Masked and Permuted Implicit Context Learning for Scene Text Recognition | May 25, 2023 | DecoderLanguage Modeling | CodeCode Available | 0 | 5 |
| Honey, I Shrunk the Language: Language Model Behavior at Reduced Scale | May 26, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Masked Language Models are Good Heterogeneous Graph Generalizers | Jun 6, 2025 | Graph LearningLanguage Modeling | CodeCode Available | 0 | 5 |
| Masked Latent Semantic Modeling: an Efficient Pre-training Alternative to Masked Language Modeling | Jul 7, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| GMAT: Global Memory Augmentation for Transformers | Jun 5, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| An Investigation of Noise in Morphological Inflection | May 26, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Data Augmentation for Biomedical Factoid Question Answering | Apr 10, 2022 | Data AugmentationInformation Retrieval | CodeCode Available | 0 | 5 |
| How transformers learn structured data: insights from hierarchical filtering | Aug 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| I-BERT: Inductive Generalization of Transformer to Arbitrary Context Lengths | Jun 18, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More | Feb 11, 2025 | DecoderInformation Retrieval | CodeCode Available | 0 | 5 |