| Synchromesh: Reliable code generation from pre-trained language models | Jan 26, 2022 | Code GenerationLanguage Modeling | CodeCode Available | 2 |
| Black-Box Tuning for Language-Model-as-a-Service | Jan 10, 2022 | In-Context LearningLanguage Modeling | CodeCode Available | 2 |
| Scaling Language Models: Methods, Analysis & Insights from Training Gopher | Dec 8, 2021 | Abstract AlgebraAnachronisms | CodeCode Available | 2 |
| DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing | Nov 18, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| ClipCap: CLIP Prefix for Image Captioning | Nov 18, 2021 | Image CaptioningLanguage Modeling | CodeCode Available | 2 |
| P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks | Oct 14, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Deduplicating Training Data Makes Language Models Better | Jul 14, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| FastMoE: A Fast Mixture-of-Expert Training System | Mar 24, 2021 | GPULanguage Modeling | CodeCode Available | 2 |
| GPT Understands, Too | Mar 18, 2021 | Knowledge ProbingLanguage Modeling | CodeCode Available | 2 |
| When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute | Feb 24, 2021 | GPULanguage Modeling | CodeCode Available | 2 |