| AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset | Apr 23, 2025 | MathMathematical Reasoning | CodeCode Available | 4 |
| MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision | May 19, 2025 | MathMathematical Reasoning | CodeCode Available | 4 |
| Knowledge Fusion of Large Language Models | Jan 19, 2024 | Code GenerationCommon Sense Reasoning | CodeCode Available | 4 |
| ChatGPT for Robotics: Design Principles and Model Abilities | Feb 20, 2023 | Mathematical ReasoningPrompt Engineering | CodeCode Available | 4 |
| OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data | Oct 2, 2024 | Arithmetic ReasoningLarge Language Model | CodeCode Available | 4 |
| ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates | Feb 10, 2025 | Hierarchical Reinforcement LearningLanguage Modeling | CodeCode Available | 4 |
| MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine | Jul 11, 2024 | Contrastive LearningLanguage Modelling | CodeCode Available | 4 |
| Galactica: A Large Language Model for Science | Nov 16, 2022 | AnachronismsBias Detection | CodeCode Available | 4 |
| SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights | Oct 11, 2024 | GSM8KMath | CodeCode Available | 4 |
| LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN prover | Jul 24, 2024 | Automated Theorem ProvingMath | CodeCode Available | 4 |
| How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition | Oct 9, 2023 | Code GenerationInstruction Following | CodeCode Available | 3 |
| Self-Refine: Iterative Refinement with Self-Feedback | Mar 30, 2023 | Mathematical ReasoningResponse Generation | CodeCode Available | 3 |
| Reinforcement Learning for Reasoning in Large Language Models with One Training Example | Apr 29, 2025 | Domain GeneralizationMath | CodeCode Available | 3 |
| Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't | Mar 20, 2025 | Mathematical ReasoningReinforcement Learning (RL) | CodeCode Available | 3 |
| RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation | Mar 8, 2024 | Code GenerationHallucination | CodeCode Available | 3 |
| General-Reasoner: Advancing LLM Reasoning Across All Domains | May 20, 2025 | AllMath | CodeCode Available | 3 |
| Reasoning with Language Model Prompting: A Survey | Dec 19, 2022 | Arithmetic ReasoningCommon Sense Reasoning | CodeCode Available | 3 |
| Self-rewarding correction for mathematical reasoning | Feb 26, 2025 | Mathematical Reasoning | CodeCode Available | 3 |
| MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning | May 13, 2024 | Data AugmentationGSM8K | CodeCode Available | 3 |
| PAL: Program-aided Language Models | Nov 18, 2022 | Arithmetic ReasoningGSM8K | CodeCode Available | 3 |
| MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem | May 20, 2025 | Mathematical Reasoningscientific discovery | CodeCode Available | 3 |
| MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs | Apr 1, 2025 | Knowledge GraphsMathematical Reasoning | CodeCode Available | 3 |
| MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning | May 20, 2024 | Continual PretrainingMathematical Reasoning | CodeCode Available | 3 |
| MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline | Jan 16, 2024 | GSM8KMath | CodeCode Available | 3 |
| DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning | Apr 15, 2025 | Mathematical ReasoningReinforcement Learning (RL) | CodeCode Available | 3 |