| DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence | Jun 17, 2024 | 16kLanguage Modeling | CodeCode Available | 9 |
| Global Structure-from-Motion Revisited | Jul 29, 2024 | 16k | CodeCode Available | 7 |
| FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | May 27, 2022 | 16k4k | CodeCode Available | 6 |
| Code Llama: Open Foundation Models for Code | Aug 24, 2023 | 16kCode Generation | CodeCode Available | 6 |
| Learning to (Learn at Test Time): RNNs with Expressive Hidden States | Jul 5, 2024 | 16k8k | CodeCode Available | 5 |
| Long-form factuality in large language models | Mar 27, 2024 | 16kForm | CodeCode Available | 4 |
| 1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data | Aug 7, 2024 | 16k2k | CodeCode Available | 3 |
| Investigating Efficiently Extending Transformers for Long Input Summarization | Aug 8, 2022 | 16kLong-range modeling | CodeCode Available | 3 |
| FlashDMoE: Fast Distributed MoE in a Single Kernel | Jun 5, 2025 | 16kCPU | CodeCode Available | 3 |
| Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset | May 17, 2024 | 16kBenchmarking | CodeCode Available | 3 |