| ThoughtSource: A central hub for large language model reasoning data | Jan 27, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks | Nov 22, 2022 | Math | CodeCode Available | 3 |
| PAL: Program-aided Language Models | Nov 18, 2022 | Arithmetic ReasoningGSM8K | CodeCode Available | 3 |
| SymForce: Symbolic Computation and Code Generation for Robotics | Apr 17, 2022 | Code GenerationMath | CodeCode Available | 3 |
| Training Verifiers to Solve Math Word Problems | Oct 27, 2021 | GSM8KMath | CodeCode Available | 3 |
| SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning | Jun 30, 2025 | MathMulti-agent Reinforcement Learning | CodeCode Available | 2 |
| OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling | Jun 25, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning | Jun 23, 2025 | GPULarge Language Model | CodeCode Available | 2 |
| Essential-Web v1.0: 24T tokens of organized web data | Jun 17, 2025 | Math | CodeCode Available | 2 |
| TreeRL: LLM Reinforcement Learning with On-Policy Tree Search | Jun 13, 2025 | Mathreinforcement-learning | CodeCode Available | 2 |
| Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning | Jun 11, 2025 | Image CaptioningMath | CodeCode Available | 2 |
| AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions | Jun 10, 2025 | Math | CodeCode Available | 2 |
| Play to Generalize: Learning to Reason Through Game Play | Jun 9, 2025 | Domain GeneralizationMath | CodeCode Available | 2 |
| MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning | Jun 5, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning | Jun 2, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO | May 28, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 2 |
| Reinforcing General Reasoning without Verifiers | May 27, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing | May 27, 2025 | Math | CodeCode Available | 2 |
| MAS-Zero: Designing Multi-Agent Systems with Zero Supervision | May 26, 2025 | MathProblem Decomposition | CodeCode Available | 2 |
| WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning | May 22, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 2 |
| RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning | May 21, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| Meta-Design Matters: A Self-Design Multi-Agent System | May 21, 2025 | MathProblem Decomposition | CodeCode Available | 2 |
| AdaptThink: Reasoning Models Can Learn When to Think | May 19, 2025 | Math | CodeCode Available | 2 |
| Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space | May 19, 2025 | GSM8KMath | CodeCode Available | 2 |
| Synthetic Data RL: Task Definition Is All You Need | May 18, 2025 | AllGSM8K | CodeCode Available | 2 |
| Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models | May 15, 2025 | Mathreinforcement-learning | CodeCode Available | 2 |
| Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving | May 12, 2025 | MathMathematical Problem-Solving | CodeCode Available | 2 |
| RM-R1: Reward Modeling as Reasoning | May 5, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 2 |
| Process Reward Models That Think | Apr 23, 2025 | Math | CodeCode Available | 2 |
| Dynamic Early Exit in Reasoning Models | Apr 22, 2025 | GSM8KMath | CodeCode Available | 2 |
| Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction | Apr 21, 2025 | Math | CodeCode Available | 2 |
| Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning | Apr 21, 2025 | AllForm | CodeCode Available | 2 |
| VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning | Apr 10, 2025 | MathMultimodal Reasoning | CodeCode Available | 2 |
| Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization | Apr 8, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| Efficient Reinforcement Finetuning via Adaptive Curriculum Learning | Apr 7, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models | Apr 7, 2025 | Dialogue EvaluationFairness | CodeCode Available | 2 |
| Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models | Apr 7, 2025 | MathQuantization | CodeCode Available | 2 |
| MegaMath: Pushing the Limits of Open Math Corpora | Apr 3, 2025 | DiversityMath | CodeCode Available | 2 |
| Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead | Mar 31, 2025 | MathSpatial Reasoning | CodeCode Available | 2 |
| CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models | Mar 28, 2025 | GPUGSM8K | CodeCode Available | 2 |
| Learning to Reason for Long-Form Story Generation | Mar 28, 2025 | FormMath | CodeCode Available | 2 |
| Reasoning to Learn from Latent Thoughts | Mar 24, 2025 | MathText Generation | CodeCode Available | 2 |
| FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models | Mar 21, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks Automation | Feb 26, 2025 | Code GenerationHumanEval | CodeCode Available | 2 |
| Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models | Feb 24, 2025 | GSM8KMath | CodeCode Available | 2 |
| SIFT: Grounding LLM Reasoning in Contexts via Stickers | Feb 19, 2025 | GSM8KMath | CodeCode Available | 2 |
| S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning | Feb 18, 2025 | Math | CodeCode Available | 2 |
| On the Emergence of Thinking in LLMs I: Searching for the Right Intuition | Feb 10, 2025 | Math | CodeCode Available | 2 |
| Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning | Feb 10, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity? | Feb 7, 2025 | 8kInformation Retrieval | CodeCode Available | 2 |