| Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism | Sep 17, 2019 | GPULAMBADA | CodeCode Available | 2 |
| MASS: Masked Sequence to Sequence Pre-training for Language Generation | May 7, 2019 | Conversational Response GenerationDecoder | CodeCode Available | 2 |
| Knowledge Representation Learning: A Quantitative Review | Dec 28, 2018 | General ClassificationInformation Retrieval | CodeCode Available | 2 |
| Training RNNs as Fast as CNNs | Jan 1, 2018 | General ClassificationLanguage Modeling | CodeCode Available | 2 |
| Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer | Jan 23, 2017 | Computational EfficiencyGPU | CodeCode Available | 2 |
| End-To-End Memory Networks | Mar 31, 2015 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing | Jul 16, 2025 | Domain GeneralizationFace Anti-Spoofing | CodeCode Available | 1 |
| Describe Anything Model for Visual Question Answering on Text-rich Images | Jul 16, 2025 | DescriptiveLanguage Modeling | CodeCode Available | 1 |
| Evaluating Morphological Alignment of Tokenizers in 70 Languages | Jul 8, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Differential Mamba | Jul 8, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |