| DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | Jan 22, 2025 | Mathematical ReasoningMulti-task Language Understanding | CodeCode Available | 15 |
| Qwen2.5 Technical Report | Dec 19, 2024 | Common Sense Reasoning | CodeCode Available | 13 |
| Qwen3 Technical Report | May 14, 2025 | Code GenerationMathematical Reasoning | CodeCode Available | 13 |
| Absolute Zero: Reinforced Self-play Reasoning with Zero Data | May 6, 2025 | Mathematical Reasoning | CodeCode Available | 11 |
| DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence | Jun 17, 2024 | 16kLanguage Modeling | CodeCode Available | 9 |
| DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models | Feb 5, 2024 | Arithmetic ReasoningMath | CodeCode Available | 9 |
| s1: Simple test-time scaling | Jan 31, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 9 |
| NeedleBench: Can LLMs Do Retrieval and Reasoning in Information-Dense Context? | Jul 16, 2024 | 4k8k | CodeCode Available | 9 |
| O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? | Nov 25, 2024 | HallucinationKnowledge Distillation | CodeCode Available | 7 |
| EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty | Jan 26, 2024 | Code GenerationInstruction Following | CodeCode Available | 7 |