| CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers | Apr 28, 2022 | Image GenerationLanguage Modeling | CodeCode Available | 2 |
| DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings | Apr 21, 2022 | Contrastive LearningLanguage Modeling | CodeCode Available | 2 |
| PaLM: Scaling Language Modeling with Pathways | Apr 5, 2022 | Auto DebuggingCode Generation | CodeCode Available | 2 |
| Do As I Can, Not As I Say: Grounding Language in Robotic Affordances | Apr 4, 2022 | Decision MakingLanguage Modeling | CodeCode Available | 2 |
| PromptDet: Towards Open-vocabulary Detection using Uncurated Images | Mar 30, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| LinkBERT: Pretraining Language Models with Document Links | Mar 29, 2022 | Document ClassificationLanguage Modeling | CodeCode Available | 2 |
| STaR: Bootstrapping Reasoning With Reasoning | Mar 28, 2022 | Common Sense ReasoningLanguage Modeling | CodeCode Available | 2 |
| Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5) | Mar 24, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Memorizing Transformers | Mar 16, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| PERT: Pre-training BERT with Permuted Language Model | Mar 14, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Block-Recurrent Transformers | Mar 11, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models | Mar 4, 2022 | DecoderGPU | CodeCode Available | 2 |
| Contextual Semantic Embeddings for Ontology Subsumption Prediction | Feb 20, 2022 | Knowledge Graph EmbeddingsLanguage Modeling | CodeCode Available | 2 |
| Online Decision Transformer | Feb 11, 2022 | D4RLEfficient Exploration | CodeCode Available | 2 |
| ProteinBERT: a universal deep-learning model of protein sequence and function | Feb 10, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| TimeLMs: Diachronic Language Models from Twitter | Feb 8, 2022 | Continual LearningLanguage Modeling | CodeCode Available | 2 |
| Cedille: A large autoregressive French language model | Feb 7, 2022 | Few-Shot LearningLanguage Modeling | CodeCode Available | 2 |
| Pre-Trained Language Models for Interactive Decision-Making | Feb 3, 2022 | Decision MakingImitation Learning | CodeCode Available | 2 |
| Formal Mathematics Statement Curriculum Learning | Feb 3, 2022 | Automated Theorem ProvingLanguage Modeling | CodeCode Available | 2 |
| Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval | Jan 28, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Synchromesh: Reliable code generation from pre-trained language models | Jan 26, 2022 | Code GenerationLanguage Modeling | CodeCode Available | 2 |
| Black-Box Tuning for Language-Model-as-a-Service | Jan 10, 2022 | In-Context LearningLanguage Modeling | CodeCode Available | 2 |
| Scaling Language Models: Methods, Analysis & Insights from Training Gopher | Dec 8, 2021 | Abstract AlgebraAnachronisms | CodeCode Available | 2 |
| ClipCap: CLIP Prefix for Image Captioning | Nov 18, 2021 | Image CaptioningLanguage Modeling | CodeCode Available | 2 |
| DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing | Nov 18, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks | Oct 14, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Deduplicating Training Data Makes Language Models Better | Jul 14, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| FastMoE: A Fast Mixture-of-Expert Training System | Mar 24, 2021 | GPULanguage Modeling | CodeCode Available | 2 |
| GPT Understands, Too | Mar 18, 2021 | Knowledge ProbingLanguage Modeling | CodeCode Available | 2 |
| When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute | Feb 24, 2021 | GPULanguage Modeling | CodeCode Available | 2 |
| Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet | Jan 28, 2021 | image-classificationImage Classification | CodeCode Available | 2 |
| The Pile: An 800GB Dataset of Diverse Text for Language Modeling | Dec 31, 2020 | DiversityLanguage Modeling | CodeCode Available | 2 |
| Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification | Oct 26, 2020 | Few-Shot Text ClassificationGeneral Classification | CodeCode Available | 2 |
| AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients | Oct 15, 2020 | image-classificationImage Classification | CodeCode Available | 2 |
| Mirostat: A Neural Text Decoding Algorithm that Directly Controls Perplexity | Jul 29, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Simplifying Paragraph-level Question Generation via Transformer Language Models | May 3, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| MPNet: Masked and Permuted Pre-training for Language Understanding | Apr 20, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| BAE: BERT-based Adversarial Examples for Text Classification | Apr 4, 2020 | Adversarial AttackAdversarial Text | CodeCode Available | 2 |
| Self-Supervised Log Parsing | Mar 17, 2020 | Anomaly DetectionFault Detection | CodeCode Available | 2 |
| CLUECorpus2020: A Large-scale Chinese Corpus for Pre-training Language Model | Mar 3, 2020 | 8kLanguage Modeling | CodeCode Available | 2 |
| Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism | Sep 17, 2019 | GPULAMBADA | CodeCode Available | 2 |
| MASS: Masked Sequence to Sequence Pre-training for Language Generation | May 7, 2019 | Conversational Response GenerationDecoder | CodeCode Available | 2 |
| Knowledge Representation Learning: A Quantitative Review | Dec 28, 2018 | General ClassificationInformation Retrieval | CodeCode Available | 2 |
| Training RNNs as Fast as CNNs | Jan 1, 2018 | General ClassificationLanguage Modeling | CodeCode Available | 2 |
| Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer | Jan 23, 2017 | Computational EfficiencyGPU | CodeCode Available | 2 |
| End-To-End Memory Networks | Mar 31, 2015 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing | Jul 16, 2025 | Domain GeneralizationFace Anti-Spoofing | CodeCode Available | 1 |
| Describe Anything Model for Visual Question Answering on Text-rich Images | Jul 16, 2025 | DescriptiveLanguage Modeling | CodeCode Available | 1 |
| Evaluating Morphological Alignment of Tokenizers in 70 Languages | Jul 8, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Differential Mamba | Jul 8, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |