| LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts | Jul 6, 2024 | Logical ReasoningMathematical Reasoning | CodeCode Available | 1 | 5 |
| RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics | May 18, 2025 | Mathematical Reasoning | CodeCode Available | 1 | 5 |
| Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination | Jul 14, 2025 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| RePO: Replay-Enhanced Policy Optimization | Jun 11, 2025 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling | May 25, 2025 | Computational EfficiencyMathematical Reasoning | CodeCode Available | 1 | 5 |
| Assessing the Creativity of LLMs in Proposing Novel Solutions to Mathematical Problems | Oct 24, 2024 | Mathematical Reasoning | CodeCode Available | 1 | 5 |
| PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models | Mar 4, 2025 | GSM8KMath | CodeCode Available | 1 | 5 |
| LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning | Jan 15, 2021 | Inductive BiasMathematical Reasoning | CodeCode Available | 1 | 5 |
| Lila: A Unified Benchmark for Mathematical Reasoning | Oct 31, 2022 | DiversityMathematical Reasoning | CodeCode Available | 1 | 5 |
| Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning | Feb 19, 2025 | Mathematical Reasoning | CodeCode Available | 1 | 5 |
| Let's Verify Math Questions Step by Step | May 20, 2025 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning? | Jun 13, 2024 | Mathematical ReasoningQuestion Answering | CodeCode Available | 1 | 5 |
| Process-Driven Autoformalization in Lean 4 | Jun 4, 2024 | Mathematical Reasoning | CodeCode Available | 1 | 5 |
| Learning to Check: Unleashing Potentials for Self-Correction in Large Language Models | Feb 20, 2024 | Mathematical Reasoning | CodeCode Available | 1 | 5 |
| Learning Multi-Step Reasoning by Solving Arithmetic Tasks | Jun 2, 2023 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| PACE: Marrying generalization in PArameter-efficient fine-tuning with Consistency rEgularization | Sep 25, 2024 | 8kDomain Adaptation | CodeCode Available | 1 | 5 |
| Peano: Learning Formal Mathematical Reasoning | Nov 29, 2022 | Automated Theorem ProvingMathematical Reasoning | CodeCode Available | 1 | 5 |
| Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations | Oct 31, 2023 | GSM8KMath | CodeCode Available | 1 | 5 |
| Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond | Jun 16, 2023 | BenchmarkingEvidence Selection | CodeCode Available | 1 | 5 |
| Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents | Feb 18, 2024 | Mathematical ReasoningMulti-hop Question Answering | CodeCode Available | 1 | 5 |
| Large Language Models for Multi-Robot Systems: A Survey | Feb 6, 2025 | Action GenerationBenchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency | Apr 24, 2025 | BenchmarkingMath | CodeCode Available | 1 | 5 |
| Learning From Mistakes Makes LLM Better Reasoner | Oct 31, 2023 | GSM8KMath | CodeCode Available | 1 | 5 |
| A Reinforcement Learning Environment for Mathematical Reasoning via Program Synthesis | Jul 15, 2021 | Mathematical ReasoningProgram Synthesis | CodeCode Available | 1 | 5 |
| Question Translation Training for Better Multilingual Reasoning | Jan 15, 2024 | Mathematical ReasoningTranslation | CodeCode Available | 1 | 5 |
| Boosting MLLM Reasoning with Text-Debiased Hint-GRPO | Mar 31, 2025 | Mathematical ReasoningMultimodal Reasoning | CodeCode Available | 1 | 5 |
| JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models | May 23, 2024 | Knowledge DistillationMath | CodeCode Available | 1 | 5 |
| Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs | Jan 11, 2025 | MathMathematical Problem-Solving | CodeCode Available | 1 | 5 |
| Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning | May 10, 2021 | Arithmetic ReasoningGeometry Problem Solving | CodeCode Available | 1 | 5 |
| Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning | Sep 29, 2022 | Logical ReasoningMath | CodeCode Available | 1 | 5 |
| KTAE: A Model-Free Algorithm to Key-Tokens Advantage Estimation in Mathematical Reasoning | May 22, 2025 | Mathematical Reasoningreinforcement-learning | CodeCode Available | 1 | 5 |
| Implicit Reasoning in Transformers is Reasoning through Shortcuts | Mar 10, 2025 | Mathematical Reasoning | CodeCode Available | 1 | 5 |
| A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning | Jul 11, 2025 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning | Oct 25, 2021 | Arithmetic ReasoningMathematical Question Answering | CodeCode Available | 1 | 5 |
| DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models | May 14, 2025 | DiversityMathematical Reasoning | CodeCode Available | 1 | 5 |
| DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning | Jul 4, 2024 | AvgGSM8K | CodeCode Available | 1 | 5 |
| Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining | Apr 10, 2025 | Mathematical ReasoningReinforcement Learning (RL) | CodeCode Available | 1 | 5 |
| A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods | Feb 3, 2025 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data | Feb 14, 2024 | Automated Theorem ProvingLanguage Modelling | CodeCode Available | 1 | 5 |
| H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasoning on Tables | Jun 29, 2024 | Fact VerificationMathematical Reasoning | CodeCode Available | 1 | 5 |
| Natural Language Reasoning, A Survey | Mar 26, 2023 | Logical ReasoningMathematical Reasoning | CodeCode Available | 1 | 5 |
| Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent | Dec 14, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics | Oct 13, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| IsarStep: a Benchmark for High-level Mathematical Reasoning | Jun 13, 2020 | Mathematical ProofsMathematical Reasoning | CodeCode Available | 1 | 5 |
| Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning | May 30, 2025 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models | Oct 7, 2024 | GSM8KLogical Reasoning | CodeCode Available | 1 | 5 |
| MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models | Aug 30, 2024 | Image CaptioningLanguage Modeling | CodeCode Available | 1 | 5 |
| Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation | Mar 17, 2025 | Mathematical ReasoningReinforcement Learning (RL) | CodeCode Available | 1 | 5 |
| Breaking the Data Barrier -- Building GUI Agents Through Task Generalization | Apr 14, 2025 | Mathematical ReasoningMultimodal Reasoning | CodeCode Available | 1 | 5 |
| Diagram Formalization Enhanced Multi-Modal Geometry Problem Solver | Sep 6, 2024 | Geometry Problem SolvingMathematical Reasoning | CodeCode Available | 1 | 5 |