| Learning to Check: Unleashing Potentials for Self-Correction in Large Language Models | Feb 20, 2024 | Mathematical Reasoning | CodeCode Available | 1 | 5 |
| Lila: A Unified Benchmark for Mathematical Reasoning | Oct 31, 2022 | DiversityMathematical Reasoning | CodeCode Available | 1 | 5 |
| Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT? | Apr 16, 2025 | Mathematical Reasoning | CodeCode Available | 1 | 5 |
| GRPO-LEAD: A Difficulty-Aware Reinforcement Learning Approach for Concise Mathematical Reasoning in Language Models | Apr 13, 2025 | Mathematical Reasoning | CodeCode Available | 1 | 5 |
| Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents | Feb 18, 2024 | Mathematical ReasoningMulti-hop Question Answering | CodeCode Available | 1 | 5 |
| Learning From Mistakes Makes LLM Better Reasoner | Oct 31, 2023 | GSM8KMath | CodeCode Available | 1 | 5 |
| Semi-Supervised Learning via Weight-aware Distillation under Class Distribution Mismatch | Aug 23, 2023 | Mathematical Reasoning | CodeCode Available | 1 | 5 |
| CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical Reasoning | Oct 14, 2024 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| R-PRM: Reasoning-Driven Process Reward Modeling | Mar 27, 2025 | Mathematical Reasoning | CodeCode Available | 1 | 5 |
| Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification | Jun 5, 2025 | Automated Theorem ProvingHallucination | CodeCode Available | 1 | 5 |
| GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning | May 30, 2021 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| Alice: Proactive Learning with Teacher's Demonstrations for Weak-to-Strong Generalization | Apr 9, 2025 | Logical ReasoningMathematical Reasoning | CodeCode Available | 1 | 5 |
| KTAE: A Model-Free Algorithm to Key-Tokens Advantage Estimation in Mathematical Reasoning | May 22, 2025 | Mathematical Reasoningreinforcement-learning | CodeCode Available | 1 | 5 |
| Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning | Sep 19, 2024 | FormInstruction Following | CodeCode Available | 1 | 5 |
| JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models | May 23, 2024 | Knowledge DistillationMath | CodeCode Available | 1 | 5 |
| Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning | May 10, 2021 | Arithmetic ReasoningGeometry Problem Solving | CodeCode Available | 1 | 5 |
| MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization | Jan 12, 2024 | Mathematical Reasoning | CodeCode Available | 1 | 5 |
| Large Language Models for Multi-Robot Systems: A Survey | Feb 6, 2025 | Action GenerationBenchmarking | CodeCode Available | 1 | 5 |
| LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning | Jan 15, 2021 | Inductive BiasMathematical Reasoning | CodeCode Available | 1 | 5 |
| GOLD: Geometry Problem Solver with Natural Language Description | May 1, 2024 | Math | CodeCode Available | 1 | 5 |
| Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models | May 20, 2025 | Instruction FollowingMathematical Reasoning | CodeCode Available | 1 | 5 |
| Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation | Aug 16, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression | Dec 6, 2022 | Geometry Problem SolvingLogical Reasoning | CodeCode Available | 1 | 5 |
| FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large Language Models | Jul 1, 2024 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| A Survey on Mathematical Reasoning and Optimization with Large Language Models | Mar 22, 2025 | Automated Theorem ProvingHeuristic Search | CodeCode Available | 0 | 5 |
| Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning | Feb 11, 2025 | Code GenerationMath | CodeCode Available | 0 | 5 |
| Mathematical Formalized Problem Solving and Theorem Proving in Different Fields in Lean 4 | Sep 9, 2024 | Abstract AlgebraAutomated Theorem Proving | CodeCode Available | 0 | 5 |
| Reasoning with Transformer-based Models: Deep Learning, but Shallow Reasoning | Jun 22, 2021 | Deep LearningLogical Reasoning | CodeCode Available | 0 | 5 |
| Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models | Mar 27, 2025 | Data VisualizationMath | CodeCode Available | 0 | 5 |
| Reverse Operation based Data Augmentation for Solving Math Word Problems | Oct 4, 2020 | Data AugmentationMath | CodeCode Available | 0 | 5 |
| CER: Confidence Enhanced Reasoning in LLMs | Feb 20, 2025 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment | Nov 18, 2024 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| AI-Assisted Generation of Difficult Math Questions | Jul 30, 2024 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| A Survey of Deep Learning for Geometry Problem Solving | Jul 16, 2025 | Deep LearningGeometry Problem Solving | CodeCode Available | 0 | 5 |
| Reasoning over Uncertain Text by Generative Large Language Models | Feb 14, 2024 | Decision MakingMathematical Reasoning | CodeCode Available | 0 | 5 |
| Explanation Selection Using Unlabeled Data for Chain-of-Thought Prompting | Feb 9, 2023 | Mathematical ReasoningNatural Language Inference | CodeCode Available | 0 | 5 |
| Probability-Consistent Preference Optimization for Enhanced LLM Reasoning | May 29, 2025 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark | Oct 6, 2024 | Mathematical ReasoningSpatial Reasoning | CodeCode Available | 0 | 5 |
| Planning and Editing What You Retrieve for Enhanced Tool Learning | Mar 30, 2024 | Mathematical ReasoningRetrieval | CodeCode Available | 0 | 5 |
| Process-based Self-Rewarding Language Models | Mar 5, 2025 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| Can LLMs Solve longer Math Word Problems Better? | May 23, 2024 | Data AugmentationMath | CodeCode Available | 0 | 5 |
| Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement | Feb 18, 2024 | Mathematical ReasoningText Generation | CodeCode Available | 0 | 5 |
| Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models | Apr 17, 2024 | FormLanguage Model Evaluation | CodeCode Available | 0 | 5 |
| Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange | Mar 30, 2024 | MathMathematical Problem-Solving | CodeCode Available | 0 | 5 |
| Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction | Jun 2, 2024 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs | Jun 11, 2025 | Mathematical Reasoning | CodeCode Available | 0 | 5 |
| Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision | May 26, 2025 | HallucinationMath | CodeCode Available | 0 | 5 |
| NeuralNexus at BEA 2025 Shared Task: Retrieval-Augmented Prompting for Mistake Identification in AI Tutors | Jun 12, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Can A Gamer Train A Mathematical Reasoning Model? | Jun 10, 2025 | GPUMathematical Reasoning | CodeCode Available | 0 | 5 |
| Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning | Oct 16, 2024 | AllGSM8K | CodeCode Available | 0 | 5 |