| Context Is Not Comprehension | Jun 5, 2025 | ListOps | —Unverified | 0 |
| Small Models, Smarter Learning: The Power of Joint Task Training | May 23, 2025 | ListOps | —Unverified | 0 |
| Investigating Recurrent Transformers with Dynamic Halt | Feb 1, 2024 | DiagnosticLanguage Modeling | CodeCode Available | 0 |
| Cached Transformers: Improving Transformers with Differentiable Memory Cache | Dec 20, 2023 | image-classificationImage Classification | CodeCode Available | 1 |
| Opening the Black Box: Analyzing Attention Weights and Hidden States in Pre-trained Language Models for Non-language Tasks | Jun 21, 2023 | Language ModellingListOps | CodeCode Available | 0 |
| Investigating Pre-trained Language Models on Cross-Domain Datasets, a Step Closer to General AI | Jun 21, 2023 | General KnowledgeListOps | —Unverified | 0 |
| Beam Tree Recursive Cells | May 31, 2023 | ListOps | CodeCode Available | 0 |
| Sequence Modeling with Multiresolution Convolutional Memory | May 2, 2023 | Density EstimationListOps | CodeCode Available | 1 |
| DARTFormer: Finding The Best Type Of Attention | Oct 2, 2022 | ListOpsNeural Architecture Search | —Unverified | 0 |
| Mega: Moving Average Equipped Gated Attention | Sep 21, 2022 | Image ClassificationInductive Bias | CodeCode Available | 2 |
| Simplified State Space Layers for Sequence Modeling | Aug 9, 2022 | Computational EfficiencyListOps | CodeCode Available | 2 |
| Training Discrete Deep Generative Models via Gapped Straight-Through Estimator | Jun 15, 2022 | ListOpsreinforcement-learning | CodeCode Available | 1 |
| Dynamic Token Normalization Improves Vision Transformers | Dec 5, 2021 | Inductive BiasListOps | CodeCode Available | 1 |
| ORCHARD: A Benchmark For Measuring Systematic Generalization of Multi-Hierarchical Reasoning | Nov 28, 2021 | DiagnosticListOps | CodeCode Available | 0 |
| Efficiently Modeling Long Sequences with Structured State Spaces | Oct 31, 2021 | Data AugmentationLanguage Modeling | CodeCode Available | 1 |
| The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization | Oct 14, 2021 | ListOpsSystematic Generalization | CodeCode Available | 1 |
| Adaptive Control Flow in Transformers Improves Systematic Generalization | Sep 29, 2021 | ListOpsSystematic Generalization | —Unverified | 0 |
| Going Beyond Linear Transformers with Recurrent Fast Weight Programmers | Jun 11, 2021 | Atari GamesListOps | CodeCode Available | 1 |
| Modeling Hierarchical Structures with Continuous Recursive Neural Networks | Jun 10, 2021 | ListOpsNatural Language Inference | CodeCode Available | 1 |
| Long Range Arena: A Benchmark for Efficient Transformers | Nov 8, 2020 | 16kBenchmarking | CodeCode Available | 1 |
| Ordered Memory | Oct 29, 2019 | ListOps | CodeCode Available | 0 |
| ListOps: A Diagnostic Dataset for Latent Tree Learning | Apr 17, 2018 | DiagnosticListOps | CodeCode Available | 1 |