| DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning | May 29, 2025 | Automated Theorem ProvingMathematical Reasoning | CodeCode Available | 1 |
| OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling | Jul 13, 2024 | BenchmarkingMath | CodeCode Available | 1 |
| Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning? | Jun 13, 2024 | Mathematical ReasoningQuestion Answering | CodeCode Available | 1 |
| A Dual-Space Framework for General Knowledge Distillation of Large Language Models | Apr 15, 2025 | Code GenerationGeneral Knowledge | CodeCode Available | 1 |
| A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level | Dec 31, 2021 | Few-Shot LearningLanguage Modelling | CodeCode Available | 1 |
| OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning | Nov 16, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| Learning Multi-Step Reasoning by Solving Arithmetic Tasks | Jun 2, 2023 | MathMathematical Reasoning | CodeCode Available | 1 |
| Learning to Check: Unleashing Potentials for Self-Correction in Large Language Models | Feb 20, 2024 | Mathematical Reasoning | CodeCode Available | 1 |
| Crosslingual Reasoning through Test-Time Scaling | May 8, 2025 | Mathematical Reasoning | CodeCode Available | 1 |
| Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents | Feb 18, 2024 | Mathematical ReasoningMulti-hop Question Answering | CodeCode Available | 1 |
| Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning | Feb 19, 2025 | Mathematical Reasoning | CodeCode Available | 1 |
| CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization | Jul 8, 2025 | Active LearningAutomated Theorem Proving | CodeCode Available | 1 |
| Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability | Nov 29, 2024 | GSM8KMath | CodeCode Available | 1 |
| A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models | Oct 21, 2022 | MathMathematical Reasoning | CodeCode Available | 1 |
| Evaluating Language Models for Mathematics through Interactions | Jun 2, 2023 | Language ModellingMathematical Problem-Solving | CodeCode Available | 1 |
| RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics | May 18, 2025 | Mathematical Reasoning | CodeCode Available | 1 |
| Learning From Mistakes Makes LLM Better Reasoner | Oct 31, 2023 | GSM8KMath | CodeCode Available | 1 |
| Let's Verify Math Questions Step by Step | May 20, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| Auto-Regressive Next-Token Predictors are Universal Learners | Sep 13, 2023 | Mathematical ReasoningText Generation | CodeCode Available | 1 |
| CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought | Feb 24, 2025 | Mathematical ReasoningMisinformation | CodeCode Available | 1 |
| AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence | Feb 19, 2025 | Code GenerationDecision Making | CodeCode Available | 1 |
| A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram | Feb 22, 2023 | Geometry Problem SolvingMathematical Reasoning | CodeCode Available | 1 |
| Control LLM: Controlled Evolution for Intelligence Retention in LLM | Jan 19, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| Large Language Models for Multi-Robot Systems: A Survey | Feb 6, 2025 | Action GenerationBenchmarking | CodeCode Available | 1 |
| Lila: A Unified Benchmark for Mathematical Reasoning | Oct 31, 2022 | DiversityMathematical Reasoning | CodeCode Available | 1 |