| ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools | Jun 18, 2024 | AllGSM8K | CodeCode Available | 14 | 5 |
| Qwen2.5 Technical Report | Dec 19, 2024 | Common Sense Reasoning | CodeCode Available | 13 | 5 |
| Qwen2.5-Coder Technical Report | Sep 18, 2024 | Code Generation | CodeCode Available | 11 | 5 |
| DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence | Jun 17, 2024 | 16kLanguage Modeling | CodeCode Available | 9 | 5 |
| General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model | Sep 3, 2024 | DecoderMath | CodeCode Available | 9 | 5 |
| DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models | Feb 5, 2024 | Arithmetic ReasoningMath | CodeCode Available | 9 | 5 |
| s1: Simple test-time scaling | Jan 31, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 9 | 5 |
| AgentRxiv: Towards Collaborative Autonomous Research | Mar 23, 2025 | Mathscientific discovery | CodeCode Available | 9 | 5 |
| O1 Replication Journey: A Strategic Progress Report -- Part 1 | Oct 8, 2024 | Mathscientific discovery | CodeCode Available | 7 | 5 |
| Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback | Jun 13, 2024 | Instruction FollowingMath | CodeCode Available | 7 | 5 |
| rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking | Jan 8, 2025 | Math | CodeCode Available | 7 | 5 |
| OpenThoughts: Data Recipes for Reasoning Models | Jun 4, 2025 | Math | CodeCode Available | 7 | 5 |
| LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! | Feb 11, 2025 | Large Language ModelMath | CodeCode Available | 7 | 5 |
| DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines | Oct 5, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 7 | 5 |
| S*: Test Time Scaling for Code Generation | Feb 20, 2025 | Code GenerationMath | CodeCode Available | 7 | 5 |
| Kimi k1.5: Scaling Reinforcement Learning with LLMs | Jan 22, 2025 | Mathreinforcement-learning | CodeCode Available | 7 | 5 |
| StarCoder 2 and The Stack v2: The Next Generation | Feb 29, 2024 | Code CompletionCode Generation | CodeCode Available | 7 | 5 |
| AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning | May 30, 2025 | GPUMath | CodeCode Available | 7 | 5 |
| xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference | Mar 17, 2025 | MambaMath | CodeCode Available | 7 | 5 |
| Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models | May 6, 2023 | Math | CodeCode Available | 7 | 5 |
| EvoAgentX: An Automated Framework for Evolving Agentic Workflows | Jul 4, 2025 | Code GenerationMath | CodeCode Available | 7 | 5 |
| SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild | Mar 24, 2025 | Instruction FollowingMath | CodeCode Available | 7 | 5 |
| TTRL: Test-Time Reinforcement Learning | Apr 22, 2025 | Mathreinforcement-learning | CodeCode Available | 7 | 5 |
| Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning | Feb 20, 2025 | Mathreinforcement-learning | CodeCode Available | 7 | 5 |
| AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration | Jun 1, 2023 | Autonomous DrivingCloud Computing | CodeCode Available | 6 | 5 |
| Qwen Technical Report | Sep 28, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 6 | 5 |
| Mistral 7B | Oct 10, 2023 | answerability predictionArithmetic Reasoning | CodeCode Available | 6 | 5 |
| GPT-4 Technical Report | Mar 15, 2023 | answerability predictionArithmetic Reasoning | CodeCode Available | 6 | 5 |
| Chain-of-Thought Prompting Elicits Reasoning in Large Language Models | Jan 28, 2022 | Common Sense ReasoningGSM8K | CodeCode Available | 6 | 5 |
| Process Reinforcement through Implicit Rewards | Feb 3, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 5 | 5 |
| LiveBench: A Challenging, Contamination-Limited LLM Benchmark | Jun 27, 2024 | ArticlesInstruction Following | CodeCode Available | 5 | 5 |
| MARIO Eval: Evaluate Your Math LLM with your Math LLM--A mathematical dataset evaluation toolkit | Apr 22, 2024 | Math | CodeCode Available | 5 | 5 |
| OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models | Oct 12, 2024 | Mathreinforcement-learning | CodeCode Available | 5 | 5 |
| Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B | Jun 11, 2024 | Decision MakingGSM8K | CodeCode Available | 5 | 5 |
| Common 7B Language Models Already Possess Strong Math Capabilities | Mar 7, 2024 | GSM8KMath | CodeCode Available | 5 | 5 |
| Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models | Mar 9, 2025 | MathMultimodal Reasoning | CodeCode Available | 5 | 5 |
| WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct | Aug 18, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 5 | 5 |
| LIMO: Less is More for Reasoning | Feb 5, 2025 | MathMathematical Reasoning | CodeCode Available | 5 | 5 |
| Evolutionary Optimization of Model Merging Recipes | Mar 19, 2024 | Evolutionary AlgorithmsMath | CodeCode Available | 5 | 5 |
| Free Process Rewards without Process Labels | Dec 2, 2024 | Math | CodeCode Available | 5 | 5 |
| Reinforcement Learning from Human Feedback | Apr 16, 2025 | MathPhilosophy | CodeCode Available | 5 | 5 |
| Dive into Deep Learning | Jun 21, 2021 | Deep LearningMath | CodeCode Available | 4 | 5 |
| LLaMA Pro: Progressive LLaMA with Block Expansion | Jan 4, 2024 | Instruction FollowingMath | CodeCode Available | 4 | 5 |
| Lean Workbook: A large-scale Lean problem set formalized from natural language math problems | Jun 6, 2024 | Automated Theorem ProvingMath | CodeCode Available | 4 | 5 |
| LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN prover | Jul 24, 2024 | Automated Theorem ProvingMath | CodeCode Available | 4 | 5 |
| Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers | Aug 12, 2024 | GSM8KMath | CodeCode Available | 4 | 5 |
| Let's Verify Step by Step | May 31, 2023 | Active LearningMath | CodeCode Available | 4 | 5 |
| Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond | Mar 13, 2025 | Domain GeneralizationMath | CodeCode Available | 4 | 5 |
| InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN Problems | Oct 21, 2024 | Automated Theorem ProvingCPU | CodeCode Available | 4 | 5 |
| How is ChatGPT's behavior changing over time? | Jul 18, 2023 | Code GenerationLanguage Modelling | CodeCode Available | 4 | 5 |